Now that the main components of the architecture have been decided, choices must be made on the overall design.
User dashboards may be very costly in a website. The fact that each user has an opportunity to design his own personal website comes with the cost of storing all that information. Efforts have been made (and are still being made) to reduce this cost, but there will always be an overhead.
This overhead might be hard to estimate as it depends a lot on how the users navigate through the website. Maybe only a minority of users will use this functionality, or maybe the website will only be made of dashboards. In any case, the impact of making this feature available must be measured by:
Estimating the number of dashboards and pages that will be created
Observing the impact on the database (through JCR) in terms of size
The JCR implementation uses Apache Lucene for indexing the data. The indexes are used to search for content, such as page nodes or WCM content.
Although Lucene is not cluster-ready, in a cluster setup, each node will need to be able to search for content and will therefore need to have access to lucene indexes.
When it comes to searching, there is always a trade-off among the following aims:
Fast search
Fast indexing
The same search results on each node at the same time (consistency)
No need to rebuild the index from scratch ever
No impact on overall performance
Easy to set up (no infrastructure change)
eXo JCR, the JCR implementation used by GateIn Portal makes it possible to configure the storage and retrieval of indexes according to architect's priorities. For configuration details please refer to the eXo JCR Reference Guide.
A standalone index is only suitable for a non-cluster environment. This is obviously the easiest setup, with a combination of in-memory and file based indexes. There is no replication involved and therefore any entry can be found by a search as soon as it is created.
In this set up, each node keeps a local copy of the full indexes so that no network communication is needed when a search is requested on a node. The downside is that when a node indexes an item, it is required to replicate that index on each and every node. If a node is unavailable at that time, it may miss an index update request which leads to inconsistencies among nodes.
When a node is added, it has to recreate its own full index. Alternatively, a node can be set up to retrieve the info from a coordinator on each search, which makes the startup of the new node faster, but impacts its runtime performance negatively. This setup is available since EPP 5.2.