The JCR API defines a mechanism for applications to query a repository for content that meets application-defined criteria. This is done by creating a query in one of several languages, and then processing the results to find the nodes and/or property values requested by the query. The earlier JCR 1.0 specification defined two languages:
- XPath is a subset of the standard XPath 2.0 query language for XML documents, and relies upon the repository content being semantically similar to an XML document. Support for this language was required by the JCR 1.0 specification.
- JCR-SQL is a query language that is based upon SQL, but does not support many of the SQL expressions. This language was optional.
However, JCR 2.0 specification deprecated these older languages, and instead defined two other query languages:
- JCR-SQL2 is SQL-like query language that is an improvement over the original and now deprecated JCR-SQL language. JCR-SQL2 is much more similar to SQL and has support for joins, richer expressions, and full-text search.
- JCR-JQOM is a language for programmatically defining a query with Java objects, and is referred to as the JCR Query Object Model (or JQOM).
Most of the JCR 2.0 implementations support all four of these languages, even though the JCR-SQL and XPath languages were deprecated in the JCR 2.0 specification. Some implementations, including ModeShape, translate all the expression-based languages into the JCR-QOM form, enabling all queries to be processed in exactly the same way.
Grammars for all of the standard JCR query languages (and one non-standard query language supported by ModeShape) are described in significant detail in the "Query language grammars" section.
When querying the content of a workspace, an implementation always evaluates the query against the persisted content of the workspace, and never considers any of the transient changes made by a session. To reinforce this idea, JCR defines a javax.jcr.query.QueryManager that can be obtained from the session's javax.jcr.Workspace instance.
The QueryManager interface defines methods for creating Query objects, executing queries, storing queries (not results) as Nodes in the repository, and reconstituting queries that were stored on Nodes. Querying a repository generally follows this pattern:
- Obtain the session's query manager
- Create a query using a particular language
- Execute the query and get the results
- Iterate over the nodes or rows in the results
This is demonstrated with the following sample:
Unlike JDBC, there's no need to close the result.
Let's look at each part in more detail.
Use the QueryManager to create a query:
The javax.jcr.query.Query interface defines constants for each of the standard query languages, and these can be used in the second parameter to specify the language. If an implementation defines additional, non-standard languages, then the implementation-specific language name would be used instead. Here's an example of creating a query and using the constant for the JCR-SQL2 language:
Note that the "createQuery" method will throw an exception if the query expression is not well-formed according to the specified language.
|Be sure you're specifying the correct language. The JCR-SQL and JCR-SQL2 languages seem pretty similar, but they actually use different syntax for identifiers. Thus, defining a JCR-SQL2 query but accidentally specifying the JCR-SQL language will often result in a bizarre exception message.|
If the query expression was valid and your application obtains a Query object, it must be executed. This is, of course, very straightforward:
Again, this might throw an exception if there was a problem executing the query.
There are several different ways of accessing the results: by nodes that satisfied the criteria, or as a table with rows of property values as defined specified in the query's SELECT expression.
Access the "result set" as a table is very similar to accessing the result set of a query using SQL (or JDBC). The query identifies which properties (e.g., columns) on which node types (e.g., tables) should be selected, and the QueryResult contains a row for each node that satisfies the query's criteria. Applications can obtain the names of the columns and can iterate over the result's rows to obtain the actual values in each column. Note that if the query involves a join, the columns may correspond to properties on different node types (e.g., "selectors").
The first step is to obtain the list of column names:
Note that if any properties (or columns) were aliased, the alias will appear in the column names.
Then, the next step is to iterate over the rows in the results and obtain the values for each column, either by position or by name. The following example shows how they can be accessed by position (or array index):
Here is an example of getting the values for each column by the column's name:
Again, there's no need to close the result.
Each row is given a relative score that ranks how well that particular row satisfied the criteria. The magnitude of the score is implementation-dependent, but a higher relative value does signal that the row was a better match than other rows with lower scores. The score can be obtained for each row using the "getScore()" method:
|Some implementations, including ModeShape, will include the score in the result columns.|
Even when accessing the results as a table of rows, it's still possible to get the underlying node that corresponds to the property values appearing in the results. When a query has one selector (that is, does not use joins), every row will contain the properties from a single node, which can be accessed from the Row. Note that the Node instance will be from the Session, and so the node may have different, transient changes to some of the properties and no longer match the persisted values that the criteria were evaluated against during processing.
If the query defined more than one selector (that is, used at least one join), then every row will correspond to a node from each selector. In this case, there is no one node and the "getNode()" method describe above will throw an exception, and instead the caller should use the "getNode(String)" method that takes a selector name. Here's a code fragment that shows how to get the node for each selector for a particular row:
Now, queries may use a self-join, which is when it uses a join and join criteria specifying that the node on both sides of the join must be the same node instance. An example is if selecting the properties defined on two node types (perhaps a primary type and a mixin), but where those properties must be on the same node. Here's a JCR-SQL2 query that uses a self-join:
This query returns all [nt:file] nodes that have a [acme:taggable] mixin with a non-null [acme:tagName].
In this case, the Node object returned for the two selectors, "file" and "tagged", will be the same node:
Just like every row has one or more nodes, the Row interface provides an easy way to access the paths of the nodes without having to first get the Node objects. And just like the "getNode()" and "getNode(String)" method, the Row interface has both the "getPath()" and "getPath(String)" methods.
|Some implementations might track the path with the query results, and may not need to load the node. Thus if you just need the Path, this might be slightly faster in some implementations.|
Although accessing the results as a table often makes the most sense, if the query involves only a single selector and your application only needs access to the nodes that satisfied the criteria, your application can get a NodeIterator that will access the result nodes in the order specified by the query:
|Using this style to access the result nodes works well for the older, deprecated JCR-SQL and XPath query languages, since they only support a single selector (no joins). Thus, many applications that used JCR 1.0 will access query results in this fashion. However, where possible, it's better to use the table-style access, as it's much more flexible and able to be used with any queries, including those with many selectors.|