JBoss.orgCommunity Documentation
eXo JCR persistent data container can work in two configuration modes:
Multi-database: One database for each workspace (used in standalone eXo JCR service mode)
Single-database: All workspaces persisted in one database (used in embedded eXo JCR service mode, e.g. in eXo portal)
The data container uses the JDBC driver to communicate with the actual database software, i.e. any JDBC-enabled data storage can be used with eXo JCR implementation.
Currently the data container is tested with the following configurations:
MySQL 5.0.18 MYSQL Connector/J 5.0.8
MySQL 5.1.36 MYSQL Connector/J 5.1.14
PostgresSQL 8.2.4 JDBC4 Driver, Version 8.2-507
PostgresSQL 8.3.7 JDBC4 Driver, Version 8.3-606
Oracle DB 10g R2 (10.2.0.4), JDBC Driver Oracle 10g R2 (10.2.0.4)
Oracle DB 11g R1 (11.1.0.6.0), JDBC Driver Oracle 11g R1 (11.1.0.6.0)
Oracle DB 11g R2 (11.2.0.1.0), JDBC Driver Oracle 11g R2 (11.2.0.1.0)
DB2 9.7.4 IBM Data Server Driver for JDBC and SQLJ (JCC Driver) v.9.7
MS SQL Server 2005 SP3 JDBC Driver 3.0
MS SQL Server 2008 JDBC Driver 3.0
Sybase 15.0.3 ASE Driver: Sybase jConnect JDBC driver v7 (Build 26502)
HSQLDB (2.0.0)
Each database software supports ANSI SQL standards but also has its own specifics. So, each database has its own configuration in eXo JCR as a database dialect parameter. If you need a more detailed configuration of the database, it's possible to do that by editing the metadata SQL-script files.
SQL-scripts you can obtain from jar-file exo.jcr.component.core-XXX.XXX.jar:conf/storage/. They also can be found at SVN here.
In the next two tables correspondence between the scripts and databases is shown.
MySQL DB | jcr-sjdbc.mysql.sql |
MySQL DB with utf-8 | jcr-sjdbc.mysql-utf8.sql |
MySQL DB with MyISAM* | jcr-sjdbc.mysql-myisam.sql |
MySQL DB with MyISAM and utf-8* | jcr-sjdbc.mysql-myisam-utf8.sql |
PostgresSQL | jcr-sjdbc.pqsql.sql |
Oracle DB | jcr-sjdbc.ora.sql |
DB2 | jcr-sjdbc.db2.sql |
MS SQL Server | jcr-sjdbc.mssql.sql |
Sybase | jcr-sjdbc.sybase.sql |
HSQLDB | jcr-sjdbc.sql |
MySQL DB | jcr-mjdbc.mysql.sql |
MySQL DB with utf-8 | jcr-mjdbc.mysql-utf8.sql |
MySQL DB with MyISAM* | jcr-mjdbc.mysql-myisam.sql |
MySQL DB with MyISAM and utf-8* | jcr-mjdbc.mysql-myisam-utf8.sql |
PostgresSQL | jcr-mjdbc.pqsql.sql |
Oracle DB | jcr-mjdbc.ora.sql |
DB2 | jcr-mjdbc.db2.sql |
MS SQL Server | jcr-mjdbc.mssql.sql |
Sybase | jcr-mjdbc.sybase.sql |
HSQLDB | jcr-mjdbc.sql |
In case the non-ANSI node name is used, it's necessary to use a
database with MultiLanguage support[TODO link to MultiLanguage]. Some JDBC
drivers need additional parameters for establishing a Unicode friendly
connection. E.g. under mysql it's necessary to add an additional parameter
for the JDBC driver at the end of JDBC URL. For instance:
jdbc:mysql://exoua.dnsalias.net/portal?characterEncoding=utf8
There are preconfigured configuration files for HSQLDB. Look for these files in /conf/portal and /conf/standalone folders of the jar-file exo.jcr.component.core-XXX.XXX.jar or source-distribution of eXo JCR implementation.
By default, the configuration files are located in service jars
/conf/portal/configuration.xml
(eXo services
including JCR Repository Service) and
exo-jcr-config.xml
(repositories configuration). In
eXo portal product, JCR is configured in portal web application
portal/WEB-INF/conf/jcr/jcr-configuration.xml
(JCR
Repository Service and related serivces) and repository-configuration.xml
(repositories configuration).
Read more about Repository configuration.
Please note, that JCR requires at least READ_COMMITED isolation level and other RDBMS configurations can cause some side-effects and issues. So, please, make sure proper isolation level is configured on database server side.
RDBMS reindexing feature use queries based on LIMIT and OFFSET clauses which are not enabled by default. However, you can ensure they are enabled by executing the following
$ db2set DB2_COMPATIBILITY_VECTOR=MYS $ db2stop $ db2start
Statistics is collected automatically starting from DB2 Version 9, however it is needed to launch statistics collection manually during the very first start, otherwise it could be very long. You need to run 'RUNSTATS' command
RUNSTATS ON TABLE <scheme>.<table> WITH DISTRIBUTION AND INDEXES ALL
for JCR_SITEM (or JCR_MITEM) and JCR_SVALUE (or JCR_MVALUE) tables.
MyISAM is not supported due to its lack of transaction support and integrity check, so use it only if you don't expect any support and if performances in read accesses are more important than the consistency in your use-case. This dialect is only dedicated to the community.
MySQL relies on collected statistics for keeping track of data distribution in tables and for optimizing join statements, but you can manually call 'ANALYZE' to update statistics if needed. For example
ANALYZE TABLE JCR_SITEM, JCR_SVALUE
Be aware, when using RDBMS for reindexing need to set "enable_seqscan" to "off" or "default_statistics_target" at least "50"
Though PostgreSQL server performs query optimization automatically, you can manualy call 'ANALYZE' command to collect statistics which can influence the performance. For example
ANALYZE JCR_SITEM ANALYZE JCR_SVALUE
One more mandatory JCR requirement for underlying databases is a case sensitive collation. Microsoft SQL Server both 2005 and 2008 customers must configure their server with collation corresponding to personal needs and requirements, but obligatorily case sensitive. For more information please refer to Microsoft SQL Server documentation page "Selecting a SQL Server Collation" here.
MS SQL DB server's optimizer automatically processes queries to increase performance. Optimization is based on statistical data which is collected automatically, but you can manually call Transact-SQL command 'UPDATE STATISTICS' which in very few situations may increase performance. For example
UPDATE STATISTICS JCR_SITEM UPDATE STATISTICS JCR_SVALUE
Sybase DB Server optimizer automatically processes queries to increase performance. Optimization is based on statistical data which is collected automatically, but you can manually call Transact-SQL command 'update statistics' which in very few situations may increase performance. For example
update statistics JCR_SITEM update statistics JCR_SVALUE
Oracle DB automatically collects statistics to optimize performance of queries, but you can manually call 'ANALYZE' command to start collecting statistics immediately which may improve performance. For example
ANALYZE INDEX JCR_PK_SITEM COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_FK COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_NAME COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SITEM_PARENT_ID COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SVALUE COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SVALUE_PROPERTY COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SREF COMPUTE STATISTICS ANALYZE INDEX JCR_IDX_SREF_PROPERTY COMPUTE STATISTICS ANALYZE INDEX JCR_PK_SCONTAINER COMPUTE STATISTICS
You need to configure each workspace in a repository. You may have each one on different remote servers as far as you need.
First of all configure the data containers in the
org.exoplatform.services.naming.InitialContextInitializer
service. It's the JNDI context initializer which registers (binds) naming
resources (DataSources) for data containers.
For example (standalone mode, two data containers
jdbcjcr
- local HSQLDB,
jdbcjcr1
- remote MySQL):
<component> <key>org.exoplatform.services.naming.InitialContextInitializer</key> <type>org.exoplatform.services.naming.InitialContextInitializer</type> <component-plugins> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="org.hsqldb.jdbcDriver"/> <property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/> <property name="username" value="sa"/> <property name="password" value=""/> </properties-param> </init-params> </component-plugin> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr1</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="com.mysql.jdbc.Driver"/> <property name="url" value="jdbc:mysql://exoua.dnsalias.net/jcr"/> <property name="username" value="exoadmin"/> <property name="password" value="exo12321"/> <property name="maxActive" value="50"/> <property name="maxIdle" value="5"/> <property name="initialSize" value="5"/> </properties-param> </init-params> </component-plugin> <component-plugins> <init-params> <value-param> <name>default-context-factory</name> <value>org.exoplatform.services.naming.SimpleContextFactory</value> </value-param> </init-params> </component>
We configure the database connection parameters:
driverClassName
, e.g.
"org.hsqldb.jdbcDriver", "com.mysql.jdbc.Driver",
"org.postgresql.Driver"
url
, e.g.
"jdbc:hsqldb:file:target/temp/data/portal",
"jdbc:mysql://exoua.dnsalias.net/jcr"
username
, e.g. "sa", "exoadmin"
password
, e.g. "", "exo12321"
There can be connection pool configuration parameters (org.apache.commons.dbcp.BasicDataSourceFactory):
maxActive
, e.g. 50
maxIdle
, e.g. 5
initialSize
, e.g. 5
and other according to Apache DBCP configuration
When the data container configuration is done, we can configure the repository service. Each workspace will be configured for its own data container.
For example (two workspaces ws
- jdbcjcr,
ws1
- jdbcjcr1):
<workspaces> <workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="dialect" value="hsqldb"/> <property name="multi-db" value="true"/> <property name="max-buffer-size" value="200K"/> <property name="swap-directory" value="target/temp/swap/ws"/> </properties> </container> <cache enabled="true"> <properties> <property name="max-size" value="10K"/><!-- 10Kbytes --> <property name="live-time" value="30m"/><!-- 30 min --> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="target/temp/index"/> </properties> </query-handler> <lock-manager> <time-out>15m</time-out><!-- 15 min --> <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister"> <properties> <property name="path" value="target/temp/lock/ws"/> </properties> </persister> </lock-manager> </workspace> <workspace name="ws1" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr1"/> <property name="dialect" value="mysql"/> <property name="multi-db" value="true"/> <property name="max-buffer-size" value="200K"/> <property name="swap-directory" value="target/temp/swap/ws1"/> </properties> </container> <cache enabled="true"> <properties> <property name="max-size" value="10K"/> <property name="live-time" value="5m"/> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="target/temp/index"/> </properties> </query-handler> <lock-manager> <time-out>15m</time-out><!-- 15 min --> <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister"> <properties> <property name="path" value="target/temp/lock/ws1"/> </properties> </persister> </lock-manager> </workspace> </workspaces>
source-name
: A javax.sql.DataSource name
configured in InitialContextInitializer component (was
sourceName
prior JCR 1.9);
dialect
: A database dialect, one of
"hsqldb", "mysql", "mysql-utf8", "pgsql", "oracle", "oracle-oci",
"mssql", "sybase", "derby", "db2", "db2v8" or "auto" for dialect
autodetection;
multi-db
: Enable multi-database container
with this parameter (set value "true");
max-buffer-size: A
a threshold (in bytes)
after which a javax.jcr.Value content will be swapped to a file in a
temporary storage. I.e. swap for pending changes.
swap-directory
: A path in the file system
used to swap the pending changes.
In this way, we have configured two workspace which will be persisted in two different databases (ws in HSQLDB, ws1 in MySQL).
Starting from v.1.9 repository configuration parameters supports human-readable formats of values (e.g. 200K - 200 Kbytes, 30m - 30 minutes etc)
It's more simple to configure a single-database data container. We have to configure one naming resource.
For example (embedded mode for jdbcjcr
data
container):
<external-component-plugins> <target-component>org.exoplatform.services.naming.InitialContextInitializer</target-component> <component-plugin> <name>bind.datasource</name> <set-method>addPlugin</set-method> <type>org.exoplatform.services.naming.BindReferencePlugin</type> <init-params> <value-param> <name>bind-name</name> <value>jdbcjcr</value> </value-param> <value-param> <name>class-name</name> <value>javax.sql.DataSource</value> </value-param> <value-param> <name>factory</name> <value>org.apache.commons.dbcp.BasicDataSourceFactory</value> </value-param> <properties-param> <name>ref-addresses</name> <description>ref-addresses</description> <property name="driverClassName" value="org.postgresql.Driver"/> <property name="url" value="jdbc:postgresql://exoua.dnsalias.net/portal"/> <property name="username" value="exoadmin"/> <property name="password" value="exo12321"/> <property name="maxActive" value="50"/> <property name="maxIdle" value="5"/> <property name="initialSize" value="5"/> </properties-param> </init-params> </component-plugin> </external-component-plugins>
And configure repository workspaces in repositories configuration with this one database. Parameter "multi-db" must be switched off (set value "false").
For example (two workspaces ws
- jdbcjcr,
ws1
- jdbcjcr):
<workspaces> <workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="dialect" value="pgsql"/> <property name="multi-db" value="false"/> <property name="max-buffer-size" value="200K"/> <property name="swap-directory" value="target/temp/swap/ws"/> </properties> </container> <cache enabled="true"> <properties> <property name="max-size" value="10K"/> <property name="live-time" value="30m"/> </properties> </cache> <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"> <properties> <property name="index-dir" value="../temp/index"/> </properties> </query-handler> <lock-manager> <time-out>15m</time-out> <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister"> <properties> <property name="path" value="target/temp/lock/ws"/> </properties> </persister> </lock-manager> </workspace> <workspace name="ws1" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="source-name" value="jdbcjcr"/> <property name="dialect" value="pgsql"/> <property name="multi-db" value="false"/> <property name="max-buffer-size" value="200K"/> <property name="swap-directory" value="target/temp/swap/ws1"/> </properties> </container> <cache enabled="true"> <properties> <property name="max-size" value="10K"/> <property name="live-time" value="5m"/> </properties> </cache> <lock-manager> <time-out>15m</time-out> <persister class="org.exoplatform.services.jcr.impl.core.lock.FileSystemLockPersister"> <properties> <property name="path" value="target/temp/lock/ws1"/> </properties> </persister> </lock-manager> </workspace> </workspaces>
In this way, we have configured two workspaces which will be persisted in one database (PostgreSQL).
Repository configuration without using of the
javax.sql.DataSource
bounded in JNDI.
This case may be usable if you have a dedicated JDBC driver implementation with special features like XA transactions, statements/connections pooling etc:
You have to remove the configuration in
InitialContextInitializer
for your database
and configure a new one directly in the workspace
container.
Remove parameter "source-name" and add next lines instead. Describe your values for a JDBC driver, database url and username.
But be careful in this case JDBC driver should implement and provide connection pooling. Connection pooling is very recommended for use with JCR to prevent a database overload.
<workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="dialect" value="hsqldb"/> <property name="driverClassName" value="org.hsqldb.jdbcDriver"/> <property name="url" value="jdbc:hsqldb:file:target/temp/data/portal"/> <property name="username" value="su"/> <property name="password" value=""/> ......
Workspaces can be added dynamically during runtime.
This can be performed in two steps:
Firstly,
ManageableRepository.configWorkspace(WorkspaceEntry
wsConfig)
- register a new configuration in
RepositoryContainer and create a WorkspaceContainer.
Secondly, the main step,
ManageableRepository.createWorkspace(String
workspaceName)
- creation of a new workspace.
eXo JCR provides two ways for interact with Database -
JDBCStorageConnection
that uses simple queries and
CQJDBCStorageConection
that uses complex queries
for reducing amount of database callings.
Simple queries will be used if you chose
org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer
:
<workspaces> <workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> ... </workspace> </worksapces>
Complex queries will be used if you chose
org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer
:
<workspaces> <workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> ... </workspace> </worksapces>
Why we should use a Complex Queries?
They are optimised to reduce amount of requests to database. |
Why we should use a Simple Queries?
Simple queries implemented in way to support as many database dialects as possible. |
Simple queries do not use sub queries, left or right joins. |
Some databases supports hints to increase query performance (like Oracle, MySQL, etc). eXo JCR have separate Complex Query implementation for Orcale dialect, that uses query hints to increase performance for few important queries.
To enable this option put next configuration property:
<workspace name="ws" auto-init-root-nodetype="nt:unstructured"> <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer"> <properties> <property name="dialect" value="oracle"/> <property name="force.query.hints" value="true" /> ......
Query hints enabled by default.
eXo JCR uses query hints only for Complex Query Oracle dialect. For all other dialects this parameter is ignored.
The current configuration of eXo JCR uses Apache DBCP connection pool
(org.apache.commons.dbcp.BasicDataSourceFactory
).
It's possible to set a big value for maxActive parameter in
configuration.xml
. That means usage of lots of TCP/IP
ports from a client machine inside the pool (i.e. JDBC driver). As a
result, the data container can throw exceptions like "Address already in
use". To solve this problem, you have to configure the client's machine
networking software for the usage of shorter timeouts for opened TCP/IP
ports.
Microsoft Windows has MaxUserPort
,
TcpTimedWaitDelay
registry keys in the node
HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParameters
,
by default these keys are unset, set each one with values like
these:
"TcpTimedWaitDelay"=dword:0000001e, sets TIME_WAIT parameter to 30 seconds, default is 240.
"MaxUserPort"=dword:00001b58, sets the maximum of open ports to 7000 or higher, default is 5000.
A sample registry file is below:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "MaxUserPort"=dword:00001b58 "TcpTimedWaitDelay"=dword:0000001e