Chapter 14. JPA Connector

14.1. Basic Model
14.2. Simple Model

This connector stores a graph of any structure or size in a relational database, using a JPA provider on top of a JDBC driver. Currently this connector relies upon some Hibernate-specific capabilities. The schema of the database is dictated by this connector and is optimized for storing a graph structure. (In other words, this connector does not expose as a graph the data in an existing database with an arbitrary schema.)

The JpaSource class provides a number of JavaBean properties that control its behavior:

Table 14.1. JpaSource properties

Property	Description
autoGenerateSchema	Sets the Hibernate setting dictating what it does with the database schema upon first connection. Valid values are as follows (though the value is not checked): "`create`" - Create the database schema objects when the `EntityManagerFactory` is created (actually when Hibernate's SessionFactory is created by the entity manager factory). If a file named "import.sql" exists in the root of the class path (e.g., '/import.sql') Hibernate will read and execute the SQL statements in this file after it has created the database objects. Note that Hibernate first delete all tables, constraints, or any other database object that is going to be created in the process of building the schema. "`create-drop`" - Same as "`create`", except that the schema will be dropped after the `EntityManagerFactory` is closed. "`update`" - Attempt to update the database structure to the current mapping (but does not read and invoke the SQL statements from "import.sql"). Use with caution. "`validate`" - Validates the existing schema with the current entities configuration, but does not make any changes to the schema (and does not read and invoke the SQL statements from "import.sql"). This is often the proper setting to use in production, and thus this is the default value.
cacheTimeToLiveInMilliseconds	Optional property that, if used, defines the maximum time in milliseconds that any information returned by this connector is allowed to be cached before being considered invalid. When not used, this source will not define a specific duration for caching information. The default value is "600000" milliseconds, or 10 minutes.
compressData	An advanced boolean property that dictates whether large binary and string values should be stored in a compressed form. This is enabled by default. Setting this value only affects how new records are stored; records can always be read regardless of the value of this setting. The default value is "true".
creatingWorkspaceAllowed	Optional property that defines whether clients can create additional workspaces. The default value is "true".
dialect	Required property that defines the dialect of the database. This must match one of the Hibernate dialect names, and must correspond to the type of driver being used.
dataSourceJndiName	The JNDI name of the JDBC DataSource instance that should be used. If not specified, the other driver properties must be set.
driverClassloaderName	The name of the class loader or classpath that should be used to load the JDBC driver class. This is not required if the DataSource is found in JNDI.
driverClassName	The name of the JDBC driver class. This is not required if the DataSource is found in JNDI, but is required otherwise.
idleTimeInSecondsBeforeTestingConnections	The number of seconds after a connection remains in the pool that the connection should be tested to ensure it is still valid. The default is 180 seconds (or 3 minutes).
largeValueSizeInBytes	An advanced boolean property that controls the size of property values at which they are considered to be "large values". Depending upon the model, large property values may be stored in a centralized area and keyed by a secure hash of the value. This is an space and performance optimization that stores each unique large value only once. The default value is "1024" bytes, or 1 kilobyte.
maximumConnectionsInPool	The maximum number of connections that may be in the connection pool. The default is "5".
maximumConnectionIdleTimeInSeconds	The maximum number of seconds that a connection should remain in the pool before being closed. The default is "600" seconds (or 10 minutes).
maximumSizeOfStatementCache	The maximum number of statements that should be cached. Statement caching can be disabled by setting to "0". The default is "100".
minimumConnectionsInPool	The minimum number of connections that will be kept in the connection pool. The default is "0".
model	An advanced property that dictates the type of storage schema that is used. Currently, the only supported values are "Basic" and "Simple". The Basic model supports a read-only mode (q.v., the "updatesAllowed" property) and database-level enforcement of referential integrity (q.v., the "referentialIntegrityEnforced" property above), but does not fully support all JCR functions. As a result, the Simple model is now the default model, but DNA repositories that were created under the Basic model will continue to use the "Basic" model regardless of the value of this property. Repositories can be converted from the Basic model to the Simple model by exporting them to an XML file as a system view through the JCR interface and then importing them into a new repository created with the model property set to "Simple".
name	The name of the repository source, which is used by the `RepositoryService` when obtaining a RepositoryConnection by name.
nameOfDefaultWorkspace	Optional property that is initialized to an empty string and which defines the name for the workspace that will be used by default if none is specified.
numberOfConnectionsToAcquireAsNeeded	The number of connections that should be added to the pool when there are not enough to be used. The default is "1".
password	The password that should be used when creating JDBC connections using the JDBC driver class. This is not required if the DataSource is found in JNDI.
predefinedWorkspaceNames	Optional property that, if used, defines names of the workspaces that are predefined and need not be created before being used. This can be coupled with a "false" value for the "creatingWorkspaceAllowed" property to allow only the use of only predefined workspaces.
referentialIntegrityEnforced (Basic Model Only)	An advanced boolean property that dictates whether the database's referential integrity should be enabled, or false if this checking is not to be used. While referential integrity does help to ensure the consistency of the records, it does add work to update operations and can impact performance. The Simple Model (q.v., the "model" property below) ignores this property and does not support this feature. The default value is "true".
retryLimit	Optional property that, if used, defines the number of times that any single operation on a RepositoryConnection to this source should be retried following a communication failure. The default value is '0'.
rootNodeUuid	Optional property that, if used, defines the UUID of the root node in the repository. If not used, then a new UUID is generated.
updatesAllowed	Determines whether the content in the database is can be updated ("true"), or if the content may only be read ("false"). The default value is "true".
url	The URL that should be used when creating JDBC connections using the JDBC driver class. This is not required if the DataSource is found in JNDI.
username	The username that should be used when creating JDBC connections using the JDBC driver class. This is not required if the DataSource is found in JNDI.

One way to configure the JPA connector is to create JcrConfiguration instance with a repository source that uses the JpaSource class. For example:



JcrConfiguration config = ...

config.repositorySource("JPA Store")

      .usingClass(JpaSource.class)

      .setDescription("The database store for our content")

      .setProperty("dialect", "org.hibernate.dialect.MySQLDialect")

      .setProperty("dataSourceJndiName", "java:/MyDataSource")

      .setProperty("defaultWorkspaceName", "My Default Workspace")

      .setProperty("autoGenerateSchema", "validate");

Of course, setting other more advanced properties would entail calling setProperty(...) for each. Since almost all of the properties have acceptable default values, however, we don't need to set very many of them.

Another way to configure the JPA connector is to create JcrConfiguration instance and load an XML configuration file that contains a repository source that uses the JpaSource class. For example a file named configRepository.xml can be created with these contents:




<?xml version="1.0" encoding="UTF-8"?>

<configuration xmlns:dna="http://www.jboss.org/dna/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0">

    <!-- 

    Define the sources for the content.  These sources are directly accessible using the 

    DNA-specific Graph API.  In fact, this is how the DNA JCR implementation works.  You 

    can think of these as being similar to JDBC DataSource objects, except that they expose 

    graph content via the Graph API instead of records via SQL or JDBC. 

    -->

    <dna:sources jcr:primaryType="nt:unstructured">

        <!-- 

        The 'JPA Store' repository is an JPA source with a single default workspace (though 

        others could be created, too).

        -->

        <dna:source jcr:name="JPA Store" 

                    dna:classname="org.jboss.dna.graph.connector.store.jpa.JpaSource"

                    dna:description="The database store for our content"

                    dna:dialect="org.hibernate.dialect.MySQLDialect"

                    dna:dataSourceJndiName="java:/MyDataSource"

                    dna:defaultWorkspaceName="default"

                    dna:autoGenerateSchema="validate"/>    

    </dna:sources>

    

    <!-- MIME type detectors and JCR repositories would be defined below --> 

</configuration>

The configuration can then be loaded from Java like this:



JcrConfiguration config = new JcrConfiguration().loadFrom("/configRepository.xml");

DNA users who prefer not to give DDL privileges to the DNA database user for this connector can use the DNA JPA DDL generation tool to create the proper DDL files for their database dialect. This tool is packaged as an executable jar in the utils/dna-jpa-ddl-gen subproject and can be executed with the following syntax:

java -jar <jar_name> -dialect <dialect name> -model <model_name> [-out <path to output directory>]

The dialect and model parameters should match the value of the dialect and model properties specified for the JPA connector.

Running this executable will create two files in the output directory (or the current directory if no output directory was specified): create.dna-jpa-connector.ddl and drop.dna-jpa-connector.ddl. The former contains the DDL to create or replace the tables, foreign keys, indices, and sequences needed by the JPA connector and the latter contains the DDL to drop any tables, foreign keys, indices, and sequences needed by the JPA connector.

14.1. Basic Model

This database schema model stores node properties as opaque records and children as transparent records. Large property values are stored separately.

The set of tables used in this model includes:

Workspaces - the set of workspaces and their names.
Namespaces - the set of namespace URIs used in paths, property names, and property values.
Properties - the properties for each node, stored in a serialized (and optionally compressed) form.
Large values - property values larger than a certain size will be broken out into this table, where they are tracked by their SHA-1 has and shared by all properties that have that same value. The values are stored in a binary (and optionally compressed) form.
Children - the children for each node, where each child is represented by a separate record. This approach makes it possible to efficiently work with nodes containing large numbers of children, where adding and removing child nodes is largely independent of the number of children. Also, working with properties is also completely independent of the number of child nodes.
ReferenceChanges - the references from one node to another
Subgraph - a working area for efficiently computing the space of a subgraph; see below
Options - the parameters for this store's configuration (common to all models)

This database model contains two tables that are used in an efficient mechanism to find all of the nodes in the subgraph below a certain node. This process starts by creating a record for the subgraph query, and then proceeds by executing a join to find all the children of the top-level node, and inserting them into the database (in a working area associated with the subgraph query). Then, another join finds all the children of those children and inserts them into the same working area. This continues until the maximum depth has been reached, or until there are no more children (whichever comes first). All of the nodes in the subgraph are then represented by records in the working area, and can be used to quickly and efficient work with the subgraph nodes. When finished, the mechanism deletes the records in the working area associated with the subgraph query.

This subgraph query mechanism is extremely efficient, performing one join/insert statement per level of the subgraph, and is completely independent of the number of nodes in the subgraph. For example, consider a subgraph of node A, where A has 10 children, and each child contains 10 children, and each grandchild contains 10 children. This subgraph has a total of 1111 nodes (1 root + 10 children + 10*10 grandchildren + 10*10*10 great-grandchildren). Finding the nodes in this subgraph would normally require 1 query per node (in other words, 1111 queries). But with this subgraph query mechanism, all of the nodes in the subgraph can be found with 1 insert plus 4 additional join/inserts.

This mechanism has the added benefit that the set of nodes in the subgraph are kept in a working area in the database, meaning they don't have to be pulled into memory.

Subgraph queries are used to efficiently process a number of different requests, including ReadBranchRequest, DeleteBranchRequest, MoveBranchRequest, and CopyBranchRequest. Processing each of these kinds of requests requires knowledge of the subgraph, and in fact all but the ReadBranchRequest need to know the complete subgraph.

14.2. Simple Model

This database schema model stores node properties as opaque records in the same row as transparent values like the node's namespace, local name, and same-name-sibling index. Large property values are stored separately. It is a small evolution of the design from the Basic model.

The set of tables used in this model includes:

Workspaces - the set of workspaces and their names.
Namespaces - the set of namespace URIs used in paths, property names, and property values.
Nodes - the nodes in the repository, where each node and its properties are represented by a single record. This approach makes it possible to efficiently work with nodes containing large numbers of children, where adding and removing child nodes is largely independent of the number of children. Since the primary consumer of DNA graph information is the JCR layer, and the JCR layer always retrieves the nodes' properties for retrieved nodes, the properties have been moved in-row with the nodes. Properties are still store in an opaque, serialized (and optionally compressed) form.
Large values - property values larger than a certain size will be broken out into this table, where they are tracked by their SHA-1 has and shared by all properties that have that same value. The values are stored in a binary (and optionally compressed) form. This is equivalent to the Basic model's approach for storing large values.
Subgraph - a working area for efficiently computing the space of a subgraph; see below
Options - the parameters for this store's configuration (common to all models)

Just like the Basic model, this model contains two tables that are used in an efficient mechanism to find all of the nodes in the subgraph below a certain node. The subgraph tables work so similarly in the Simple model that the description from the Basic model still applies.

In the Simple model, subgraph queries are used to efficiently process a number of different requests, including ReadBranchRequest and DeleteBranchRequest. Processing each of these kinds of requests requires knowledge of the subgraph, and in fact all but the ReadBranchRequest need to know the complete subgraph.