Skip to end of metadata
Go to start of metadata

The JCR API defines a way to query a repository for content that meets user-defined criteria. The JCR 2.0 API actually makes it possible for implementations to support multiple query languages, and the specification requires support for two languages: JCR-SQL2 and JCR-QOM.JCR 1.0defined two other languages (XPath and JCR-SQL), though these languages were deprecated inJCR 2.0.

At this time, ModeShape supports all of these query languages, plus one search-engine-like language called "search" that is actually just the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language.

ModeShape handles all of these languages in nearly the same manner, the only difference being whether the query is represented as a string or build programmatically using the javax.jcr.query.qom part of the JCR API.

  1. A language-independent representation, called the query model, is constructed by parsing the string representation of the query (using a language-specific parser) or the JCR-QOM objects created by the client.
  2. The language-independent query model is used to create a canonical (relational) query plan.
  3. The canonical query plan is then validated to ensure that all identifiers in the query are resolvable.
  4. The canonical query plan is then optimized using a flexible rule-based optimizer. Optimizations include (but are not limited to): replace view references; unify handling of aliases; convert right outer joins into left outer joins; choose algorithms for each join; raise and lower criteria; push projection of columns as low in the plan as possible; duplicate criteria across identity joins; rewrite identity joins involving only columns that form keys; remove parts of the plan that (based upon the criteria) will return no rows; determination of the low-level "access" queries that will be submitted to the connector layer.
  5. The optimized query plan is then executed, whereby each access query is pushed down to the connector and the results are then processed and combined to produce the desired result set.
    Note that only the parsing step is dependent upon the query language. This means that all of the query languages are processed using the same, unified engine.

The rest of this chapter describes how your applications can use queries to search your repositories, and outlines the specifics of each of the four query languages available in ModeShape.

JCR Query API

With ModeShape, all query operations can be performed using only the JCR API interfaces. The first step is to obtain the QueryManager from your Session instance. The QueryManager interface defines methods for creating Query objects, executing queries, storing queries (not results) as Nodes in the repository, and reconstituting queries that were stored on Nodes. Thus, querying a repository generally follows this pattern:

For more detail about these methods or about how to use other facets of the JCR query API, please consult chapter 6 of the JCR 2.0 specification.

JCR XPath Query Language

The JCR 1.0 specification uses the XPath query language because node structures in JCR are very analogous to the structure of an XML document. Thus, XPath provides a useful language for selecting and searching workspace content. And since JCR 1.0 defines a mapping between XML and a workspace view called the "document view", adapting XPath to workspace content is quite natural.

A JCR XPath query specifies the subset of nodes in a workspace that satisfy the constraints defined in the query. Constraints can limit the nodes in the results to be those nodes with a specific (primary or mixin) node type, with properties having particular values, or to be within a specific subtree of the workspace. The query also defines how the nodes are to be returned in the result sets using column specifiers and ordering specifiers.

ModeShape offers a bit more functionality in the "jcr:contains(...)" clauses than required by the specification. In particular, the second parameter specifies the search expression, and for these ModeShape accepts full-text search language expressions, including wildcard support.

As an aside, ModeShape actually implements XPath queries by transforming them into the equivalent JCR-SQL2 representation. And the JCR-SQL2 language, although often more verbose, is much more capable of representing complex queries with multiple combinations of type, property, and path constraints.

Column Specifiers

JCR 1.0 specifies that support is required only for returning column values based upon single-valued, non-residual properties that are declared on or inherited by the node types specified in the type constraint. ModeShape follows this requirement, and does not specifying residual properties. However, ModeShape does allow multi-valued properties to be specified as result columns. And as per the specification, ModeShape always returns the "jcr:path" and "jcr:score" pseudo-columns.

ModeShape uses the last location step with an attribute axis to specify the properties that are to be returned as result columns. Multiple properties are specified with a union. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.

XPath JCR-SQL2
//* SELECT * FROM [nt:base]
//element(*,my:type) SELECT * FROM [my:type]
//element(*,my:type)/@my:title SELECT [my:title] FROM [my:type]
//element(*,my:type)/(@my:title | @my:text) SELECT [my:title], [my:text] FROM [my:type]
//element(*,my:type)/(@my:title union @my:text) SELECT [my:title], [my:text] FROM [my:type]

Specifying result set columns

Type Constraints

JCR 1.0 specifies that support is required only for specifying constraints of one primary type, and it is optional to support specifying constraints on one (or more) mixin types. The specification also defines that the XPath element test be used to test against node types, and that it is optional to support element tests on location steps other than the last one. Type constraints are inherently inheritance-sensitive, in that a constraint against a particular node type 'X' will be satisfied by nodes explicitly declared to be of type 'X' or of subtypes of 'X'.

ModeShape does support using the element test to test against primary or mixin type. ModeShape also only supports using an element test on the last location step. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.

XPath JCR-SQL2
//* SELECT * FROM [nt:base]
//element(*,my:type) SELECT * FROM [my:type]
/jcr:root/nodes/element(*,my:type) SELECT * FROM [my:type]
WHERE PATH([my:type])> LIKE '/nodes/%'
{{AND DEPTH([my:type]) = CAST(2 AS LONG) }}
/jcr:root/nodes//element(*,my:type) SELECT * FROM [my:type]
WHERE PATH([my:type]) LIKE '/nodes/%'
/jcr:root/nodes//element(ex:nodeName,my:type) SELECT * FROM [my:type]
WHERE PATH([my:type]) LIKE '/nodes/%'
AND NAME([my:type]) = 'ex:nodeName'

Specifying type constraints

Note that the JCR-SQL2 language supported by ModeShape is far more capable of joining multiple sets of nodes with different type, property and path constraints.

Property Constraints

JCR 1.0 specifies that attribute tests on the last location step is required, but that predicate tests on any other location steps are optional.

ModeShape does support using attribute tests on the last location step to specify property constraints, as well as supporting axis and filter predicates on other location steps. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.

XPath JCR-SQL2
//*[] SELECT * FROM [nt:base]
WHERE [nt:base].prop1 IS NOT NULL
//element(*,my:type)[@prop1] SELECT * FROM [my:type]
WHERE [my:type].prop1 IS NOT NULL
//element(*,my:type)[@prop1=xs:boolean('true')] SELECT * FROM [my:type]
WHERE [my:type].prop1 = CAST('true' AS BOOLEAN)
//element(*,my:type)[@id<1 and @name='john'] SELECT * FROM [my:type]
WHERE id < 1 AND name = 'john'
//element(*,my:type)[a/b/@id] SELECT * FROM [my:type]
JOIN [nt:base] as nodeSet1
ON ISCHILDNODE(nodeSet1,[my:type])
JOIN [nt:base] as nodeSet2
ON ISCHILDNODE(nodeSet2,nodeSet1)
WHERE (NAME(nodeSet1) = 'a'
{{AND NAME(nodeSet2) = 'b') }}
AND nodeSet2.id IS NOT NULL]
//element(,my:type)[./{}{}/*/@id] SELECT * FROM [my:type]
JOIN [nt:base] as nodeSet1
ON ISCHILDNODE(nodeSet1,[my:type])
JOIN [nt:base] as nodeSet2
ON ISCHILDNODE(nodeSet2,nodeSet1)
WHERE nodeSet2.id IS NOT NULLL
//element(*,my:type)[.//@id] SELECT * FROM [my:type]
JOIN [nt:base] as nodeSet1
ON ISDESCENDANTNODE(nodeSet1,[my:type])
WHERE nodeSet2.id IS NOT NULLL

Specifying property constraints

Section 6.6.3.3 of the JCR 1.0 specification contains an in-depth description of property value constraints using various comparison operators.

Path Constraints

JCR 1.0 specifies that exact, child node, and descendants-or-self path constraints be supported on the location steps in an XPath query.

ModeShape does support the four kinds of path constraints. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.

XPath JCR-SQL2
/jcr:root/a[1]/b[2] SELECT * FROM [nt:base]
WHERE PATH([nt:base]) = '/a[1]/b[2]'
/jcr:root/a/b[*] SELECT * FROM [nt:base]
WHERE PATH([nt:base]) = '/a[%]/b[%]'
/jcr:root/a[1]/b[*] SELECT * FROM [nt:base]
WHERE PATH([nt:base]) = '/a[%]/b[%]'
/jcr:root/a[2]/b SELECT * FROM [nt:base]
WHERE PATH([nt:base]) = '/a[2]/b[%]'
/jcr:root/a/b[2]//c[4] SELECT * FROM [my:type]
WHERE PATH([nt:base]) = '/a[%]/b[2]/c[4]'
OR PATH(nodeSet1) LIKE '/a[%]/b[\2]/%/c[\4]'
/jcr:root/a/b//c//d SELECT * FROM [my:type]
WHERE PATH([nt:base]) = '/a[%]/b[%]/c[%]/d[%]'
OR PATH([nt:base]) LIKE '/a[%]/b[%]/%/c[%]/d[%]'
OR PATH([nt:base]) LIKE '/a[%]/b[%]/c[%]/%/d[%]'
OR PATH([nt:base]) LIKE '/a[%]/b[%]/%/c[%]/%/d[%]'
//element(*,my:type)[@id<1 and @name='john'] SELECT * FROM [my:type]
WHERE id < 1 AND name = 'john'
/jcr:root/a/b//element(*,my:type) SELECT * FROM [my:type]
WHERE PATH([my:type]) = '/a[%]/b[%]/%'

Specifying path constraints

Note that the JCR-SQL2 language supported by ModeShape is capable of representing a wider combination of path constraints, although the XPath expressions are easier to understand and significantly shorter.

Also, path constraints in XPath do not need to specify wildcards for the same-name-sibling (SNS) indexes, as XPath should naturally find all nodes regardless of the SNS index, unless the SNS index is explicitly specified. In other words, any path segment that does not have an explicit SNS index (or an SNS index of '[%]' or '[]') will match _all SNS index values. However, any segments in the path expression that have an explicit numeric SNS index will require an exact match. Thus this path constraint:

will effectively be converted into

This behavior is very different than how JCR-SQL and JCR-SQL2 path constraints are handled, since these languages interpret a lack of a SNS index as equating to '[1]'. To achieve the XPath-like matching, a query written in JCR-SQL or JCR-SQL2 would need to explicitly include '[%]' in each path segment where an SNS index literal is not already specified.

Ordering Specifiers

JCR 1.0 extends the XPath grammar to add support for ordering the results according to the natural ordering of the values of one or more properties on the nodes.

ModeShape does support zero or more ordering specifiers, including whether each specifier is ascending or descending. If no ordering specifiers are defined, the ordering of the results is not predefined and may vary (though ordering by score may be used by default). For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.

XPath JCR-SQL2
//element(,) order by @title SELECT nodeSet1.title
FROM [nt:base] AS nodeSet1
ORDER BY nodeSet1.title
//element(,) order by jcr:score() SELECT *
FROM [nt:base] AS nodeSet1
ORDER BY SCORE(nodeSet1)
//element(*,my:type) order by jcr:score(my:type) SELECT *
FROM [my:type] AS nodeSet1
ORDER BY SCORE(nodeSet1)
//element(,) order by @jcr:path SELECT jcr:path
FROM [nt:base] AS nodeSet1
ORDER BY PATH(nodeSet1)
//element(,) order by @title, @jcr:score SELECT nodeSet1.title
FROM [nt:base] AS nodeSet1
ORDER BY nodeSet1.title,
SCORE(nodeSet1)

Specifying result ordering

Note that the JCR-SQL2 language supported by ModeShape has a far richer ORDER BY clause, allowing the use of any kind of dynamic operand, including ordering upon arithmetic operations of multiple dynamic operands.

Miscellaneous

JCR 1.0 defines a number of other optional and required features, and these are summarized in this section.

  • Only abbreviated XPath syntax is supported.
  • Only the child axis (the default axis, represented by '/' in abbreviated syntax), descendant-or-self axis (represented by '//' in abbreviated syntax), self axis (represented by '.' in abbreviated syntax), and attribute axis (represent by '@' in abbreviated syntax) are supported.
  • The text() node test is not supported.
  • The element() node test is supported.
  • The jcr:like() function is supported.
  • The jcr:contains() function is supported.
  • The jcr:score() function is supported.
  • The jcr:deref() function is not supported.

JCR-SQL Query Language

The JCR-SQL query language is defined by the JCR 1.0 specification as a way to express queries using strings that are similar to SQL. Support for the language is optional, and in fact this language was deprecated in the JCR 2.0 specification in favor of the improved and more powerful (and more SQL-like) JCR-SQL2 language, which is covered in the next section.

The JCR 2.0 specification defines how nodes in a repository are mapped onto relational tables queryable through a SQL-like language, including JCR-SQL and JCR-SQL2. Basically, each node type is mapped as a relational view with a single column for each of the node type's (residual and non-residual) property definitions. Conceptually, each node in the repository then appears as a record inside the view corresponding to the node type for which "Node.isNodeType(nodeTypeName)" would return true.

Since each node likely returns true from this method for multiple node type (e.g., the primary node type, the mixin types, and all supertypes of the primary and mixin node types), all nodes will likely appear as records in multiple views. And since each view only exposes those properties defined by (or inherited by) the corresponding node type, a full picture of a node will likely require joining the views for multiple node types. This special kind of join, where the nodes have the same identity on each side of the join, is referred to as an identity join, and is handled very efficiently by ModeShape.

ModeShape includes support for the JCR-SQL language, and adds several extensions to make it even more powerful and useful:

  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.
  • Removal of duplicate rows in the results, using "SELECT DISTINCT ...".
  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped.
  • Support for the IN and NOT IN clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... prop1 IN (3,5,7,10,11,50) ...".
  • Support for the BETWEEN clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... prop1 BETWEEN 3 EXCLUSIVE AND 10 ...".
  • Support for (non-correlated) subqueries in the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., IN (...)), then all of the subquery's rows will be used. For example, this query "WHERE ... prop1 IN ( SELECT my:prop2 FROM my:type2 WHERE my:prop3 < '1000' ) AND ..." will use the results of the subquery as the literal values in the IN clause.

The grammar for the JCR-SQL query language is actually a superset of that defined by the JCR 1.0 specification, and as such the complete grammar is included here.

The grammar is presented using the same EBNF nomenclature as used in the JCR 1.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values. Literals (or keywords) are denoted by single-quotes.

Grammar

JCR-SQL2 Query Language

The JCR-SQL2 query language is defined by the JCR 2.0 specification as a way to express queries using strings that are similar to SQL. This query language is an improvement over the JCR-SQL language, providing among other things far richer specifications of joins and criteria.

ModeShape includes full support for the complete JCR-SQL2 query language. However, ModeShape adds several extensions to make it even more powerful:

  • Support for the "FULL OUTER JOIN" and "CROSS JOIN" join types, in addition to the "LEFT OUTER JOIN", "RIGHT OUTER JOIN" and "INNER JOIN" types defined by JCR-SQL2. Note that "JOIN" is a shorthand for "INNER JOIN". For detail, see the grammar for joins.
  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.
  • Removal of duplicate rows in the results, using "SELECT DISTINCT ...". For detail, see the grammar for queries.
  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped. For detail, see the grammar for limits and offsets.
  • Additional dynamic operands "DEPTH(<selectorName>)" and "PATH(<selectorName>))" that enable placing constraints on the node depth and path, respectively. These dynamic operands can be used in a manner similar to "NAME(<selectorName>)" and "LOCALNAME(<selectorName>)" that are defined by JCR-SQL2. Note in each of these cases, the selector name is optional if there is only one selector in the query. For detail, see the grammar for dynamic operands.
  • Additional dynamic operand "REFERENCE(<selectorName>.<propertyName>)" and "REFERENCE(<propertyName>)" that enables placing constraints on one or any of the reference properties, respectively, and which can be used in a manner similar to " PropertyValue(<propertyName>)". Note in each of these cases, the selector name is optional if there is only one selector in the query, and that the property name can be excluded if the constraint should apply to all reference properties. For detail, see the grammar for dynamic operands.
  • Support for the IN and NOT IN clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... [my:type].[prop1] IN (3,5,7,10,11,50) ...". For detail, see the grammar for set constraints.
  • Support for the BETWEEN clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... [my:type].[prop1] BETWEEN 3 EXCLUSIVE AND 10 ...". For detail, see the grammar for between constraints.
  • Support for simple arithmetic in numeric-based criteria and order-by clauses. For example, "... WHERESCORE(type1) +SCORE(type2) > 1.0" or "... ORDER BY(SCORE(type1) * SCORE(type2)) ASC,LENGTH(type2.property1) DESC". For detail, see the grammar for order-by clauses.
  • Support for (non-correlated) subqueries in the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., IN (...)), then all of the subquery's rows will be used. For example, this query "WHERE ... [my:type].[prop1] IN (SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ..." will use the results of the subquery as the literal values in the IN clause.
  • Support for several pseudo-columns ("jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth") that can be used in the SELECT, equijoin, and WHERE clauses. These pseudo-columns make it possible to return location-related and score information within the QueryResult's rows. They also make queries look more like SQL, and thus may be more friendly and easier to use in existing SQL-aware client applications. See the detailed description for more information.

The grammar for the JCR-SQL2 query language is actually a superset of that defined by the JCR 2.0 specification, and as such the complete grammar is included here.

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values. Literals (or keywords) are denoted by single-quotes.

Grammar

Queries

Sources

Joins

Equi-Join Conditions

Same-Node Join Conditions

Child-Node Join Conditions

Descendant-Node Join Conditions

Constraints

And Constraints

Or Constraints

Not Constraints

Comparison Constraints

Between Constraints

Property Existence Constraints

Set Constraints

Full-text Search Constraints

where FulltextSearch is defined by the following, and is the same as the full-text search language supported by ModeShape:

Same-Node Constraint

Child-Node Constraints

Descendant-Node Constraints

Paths and Names

Static Operands

Bind Variables

Subqueries

Dynamic Operands

Ordering

Columns

Limit and Offset

Pseudo-columns

The design of the JCR-SQL2 query language makes fairly heavy use of functions, including SCORE(), NAME(), and LOCALNAME(). ModeShape adds several more useful functions, including PATH() and DEPTH(), that follow the same patterns.

However, there are several disadvantages of these functions. First, they make the JCR-SQL2 language less "SQL-like", since SQL-92 and -99 don't define these kinds of functions. (There are aggregate functions, like COUNT, SUM, etc., but they are not terribly analogous.) This means that applications that use SQL and SQL-like query languages are less likely to be able to build and issue JCR-SQL2 queries.

A second disadvantage of these functions is that JCR-SQL2 does not allow them to be used within the SELECT clause. As a result, the location-related and score information cannot be included as columns of values in the QueryResult rows. Instead, a client can only access this information by obtaining the Node object(s) for each row. Relying upon both the result set and additional Java objects makes it difficult to use.

For example, ModeShape's JDBC driver is designed to enable JDBC-aware applications to query repository content using JCR-SQL2 queries. The standard JDBC API cannot expose the Node objects, so the only way to return the path-related and score information is through additional columns in the result. While such columns could "magically" appear in the result set, doing this is not compatible with JDBC applications that dynamically build queries based upon database metadata. Such applications require the columns to be properly described in database metadata, and the columns need to be used within queries.

ModeShape attempts to solve these issues by directly supporting a number of "pseudo-columns" within JCR-SQL2 queries, wherever columns can be used. These "pseudo-columns" include:

  • jcr:score is a column of type DOUBLE that represents the full-text search score of the node, which is a measure of the node's relevance to the full-text search expression. ModeShape does compute the scores for all queries, though the score for rows in queries that do not include a full-text search criteria may not be reliable.
  • jcr:path is a column of type PATH that represents the normalized path of a node, including same-name siblings. This is the same as what would be returned by the getPath() method of Node. Examples of paths include "/jcr:system" and "/foo/bar[3]".
  • jcr:name is a column of type NAME that represents the node name in its namespace-qualified form using namespace prefixes and excluding same-name-sibling indexes. Examples of node names include "jcr:system", "jcr:content", "ex:UserData", and "bar".
  • mode:localName is a column of type STRING that represents the local name of the node, which excludes the namespace prefix and same-name-sibling index. As an example, the local name of the "jcr:system" node is "system", while the local name of the "ex:UserData[3]" node is "UserData".
  • mode:depth is a column of type LONG that represents the depth of a node, which corresponds exactly to the number of path segments within the path. For example, the depth of the root node is 0, whereas the depth of the "/jcr:system/jcr:nodeTypes" node is 2.

All of these pseudo-columns can be used in the SELECT clause of any JCR-SQL2 query, and their use defines whether such columns appear in the result set. In fact, all of these pseudo-columns will be included when "SELECT *" clauses in JCR-SQL2 queries are expanded by the query engine. This means that every node type (even mixin node types that have no properties and are essentially markers) are represented by a queryable table.

Like any other column, all of these pseudo-columns can be also be used in the WHERE clause of any JCR-SQL2 query, even if they are not included in the SELECT clause. They can be used anywhere that a regular column can be used, including within constraints and dynamic operands. ModeShape will automatically rewrite queries that use pseudo-columns in the dynamic operands to use the corresponding function, such as SCORE(), PATH(), NAME(), LOCALNAME(), and DEPTH(). Additionally, any property existence constraint using these pseudo-columns will always evaluate to 'true' (and will thus be removed by the optimizer).

The jcr:path pseudo-column may also be used on both sides of an equijoin constraint clause. For example:

Equijoins of this form will be automatically rewritten by the optimizer to the following form:

As with regular columns, the pseudo-columns must be qualified with the selector name if the query contains more than one selector.

Note that the jcr:path and jcr:score pseudo-columns are consistent with the pseudo-columns of the same names used in JCR-SQL query language. However, unlike in JCR-SQL, in JCR-SQL2 these columns are not automatically included in the results unless explicitly included in the SELECT clause or implicitly included via "SELECT *"

Example JCR-SQL2 queries

One of the simplest JCR-SQL2 queries finds all nodes in the current workspace of the repository:

This query will return a result set containing the "jcr:primaryType" column, since the nt:base defines only one single-valued property called "jcr:primaryType". (As allowed by the JCR 2.0 Specification, ModeShape does not currently support returning multi-valued columns in result sets, and thus the above query does not include a column for the jcr:mixinTypes multi-valued property.)

Since our query used "SELECT *", ModeShape also includes the non-standard pseudo-columns mentioned above: "jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth". These columns are very convenient to have in the results, but also make certain criteria much easier than with the corresponding standard or ModeShape-specific functions.

Queries can explicitly specify the columns that are to be returned in the results. The following query is very similar to the previous query and will return the same rows, but the result set will have only a single column:

The following query will return the same rows as in the previous two queries, but the SELECT clause explicitly includes only two of the pseudo-columns for the path and depth (which are computed from the nodes' locations):

In JCR-SQL2, a table representing a particular node type will have a column for each of the node type's property definitions, including those inherited from supertypes. For example, the nt:file node type, its nt:hierarchyNode supertype, and the mix:created mixin type are defined using the CND notation as follows:

Therefore, the table representing the nt:file node type will have two three columns: the jcr:created and jcr:createdBy columns inherited from the mix:created mixin node type (via the nt:hierarchyNode node type), and the jcr:primaryType column inherited from the nt:base node type, which is the implicit supertype of the nt:hierarchyNode.

ModeShape adheres to this behavior with the exception that a "SELECT *" will result in the additional pseudo-columns. Thus, this next query:

is equivalent to this query:

Here is an example query that selects some of the available columns from the nt:file table and uses a constraint to ensure the resulting file nodes have names that end in '.txt':

This query is JCR-SQL2, but ModeShape supports placing criteria against the mode:localName pseudo-column instead of using the LOCALNAME() function. Such a query is equivalent to the previous query and will produce the exact same results:

Although this query looks much more like SQL, the use of the '[' and ']' characters to quote the identifiers is not typical of a SQL dialect. ModeShape actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous two queries:

In JCR-SQL2, a node will appear as a row in each table that corresponds to the node types defined by that node's primary type or mixin types, or any supertypes of these node types. In other words, a node will appear in the table corresponding to each node type for which Node.isNodeType(...) returns true.

For example, consider a node that has a primary type of nt:file but has an explicit mixin of mix:referenceable. This node will appear as a row in the all of these tables: nt:file, mix:referenceable, nt:hierarchyNode, mix:created, and nt:base. However, the columns in each of these tables will differ. The nt:file node type has the nt:hierarchyNode, mix:created, and nt:base for supertypes, and therefore the table for nt:file contains columns for the property definitions on all of these types. But because mix:referenceable is not a supertype of nt:file, the table for nt:file will not contain a jcr:uuid column. To obtain a single result set that contains columns for all the properties of our node, we need to perform an identity join. The next query shows how this is done to return all properties for nt:file nodes that are also mix:referenceable:

ModeShape expands the SELECT clause to include the columns for all (explicit and inherited) property definitions of each type plus pseudo-columns for each type:

Note because we are using an identity join, the "file.[jcr:path]" column will contain the same value as the "ref.[jcr:path]". Therefore, as is typical with standard SQL and relational databases, with JCR-SQL2 you will probably want to fully expand the SELECT clause to be exactly what you want. Here is a query that does this by eliminating columns with duplicate values and using aliases that are simpler than the namespace-qualified names:

Although this query looks much more like SQL, JCR-SQL2's use of the '[' and ']' characters to quote the identifiers is not typical of a SQL dialect. ModeShape actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). This makes it easier for existing SQL-oriented tools and applications to work more readily with ModeShape, including applications that use ModeShape's JDBC driver to query a ModeShape JCR repository.

Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous query, except that it uses double quotes and a pseudo-column identity constraint on "jcr:path" (which is identical in semantics and performance as the "ISSAMENODE(...)" constraint):

These are examples of two-way inner joins, but ModeShape supports joining multiple tables together in a single query. ModeShape also supports a variety of joins, including INNER JOIN (or just JOIN), LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN.

ModeShape also supports several other query features beyond JCR-SQL2. One of these is support for UNION, INTERSECT, EXCEPT, UNION ALL, INTERSECT ALL and EXCEPT ALL. Here is an example of a union:

ModeShape also supports using (non-correlated) subqueries within the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries, though, should return a single column (all other columns will be ignored), and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used.

Subqueries in ModeShape are a powerful and easy way to use more complex criteria that is a function of the content in the repository, without having to resort to multiple queries (take the results of one query and dynamically generate the criteria of another query).

Here's an example of a query that finds all nt:file nodes in the repository whose paths are referenced in the value of the vdb:originalFile property of the vdb:virtualDatabase nodes. (This query also uses the "$maxVersion" bind variables in the subquery.)

Without subqueries, this query would need to be broken into two separate queries: the first would find all of the paths referenced by the vdb:virtualDatabase nodes matching the version and description criteria, followed by one (or more) subsequent queries to find the nt:file nodes with the paths expressed as literal values (or bind variables). Using a subquery is not only easier to implement and understand, it is actually more efficient.

The examples shown in this section hopefully begin to show the power and flexibility of JCR-SQL2 and the ModeShape extensions.

Full-Text Search Language

There are times when a formal structured query language is overkill, and the easiest way to find the right content is to perform a search, like you would with a search engine such as Google or Yahoo! This is where ModeShape's full-text search language comes in, because it allows you to use the JCR query API but with a far simpler, Google-style search grammar.

This query language is actually defined by the JCR 2.0 specification as the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language. We just pulled it out and made it available as a first-class query language, such that a full-text search query supplied by the user, full-text-query, is equivalent to executing this JCR-SQL2:

This language allows a JCR client to construct a query to find nodes with property values that match the supplied terms. Nodes that "best" match the terms are returned before nodes that have a lesser match. Of course, ModeShape uses a complex system to analyze the node content and the query terms, and may perform a number of optimizations, such as (but not limited to) eliminating stop words (e.g., "the", "a", "and", etc.), treating terms independent of case, and converting words to base forms using a process called stemming (e.g., "running" into "run", "customers" into "customer").

Search terms can also include phrases by simply wrapping the phrase with double-quotes. For example, the search term 'table "customer invoice"' would rank higher those nodes with properties containing the phrase "customer invoice" than nodes with properties containing just "customer" or "invoice".

Term in the query are implicitly AND-ed together, meaning that the matches occur when a node has property values that match all of the terms. However, it is also possible to put an "OR" in between two terms where either of those terms may occur.

By default, all terms are assumed to be positive terms, in the sense that the occurrence of the term will increase the rank of any nodes containing the value. However, it is possible to specify that terms should not appear in the results. This is called a negative term, and it reduces the rank of any node whose property values contain the the value. To specify a negative term, simply prefix the term with a hyphen ('-').

Each term may also contain wildcards to specify the pattern to be matched (or negated). ModeShape supports two different sets of wildcards:

  • '*' matches zero or more characters, and '?' matches any single character; and
  • '%' matches zero or more characters, and '_' matches any single character.
    The former are wildcards that are more commonly used in various systems (including older JCR repository implementations), while the latter are the wildcards used in LIKE expressions in both JCR-SQL and JCR-SQL2. Both families are supported for convenience, and you can also mix and match and combine the various wildcards, such as 'ta*bl_' and 'ta__ble%'. (Of course, placing multiple '' or '%' characters next to each other offers no real benefit, as it is equivalent to a single '' or '%'.)

If you want to use these characters literally in a term and do not want them to be treated as wildcards, they must be escaped by prefixing them with a '\' character. For example, this full text search expression:

will would rank higher those nodes with properties containing 'table*' (including the asterisk) and those containing the phrase "customer invoice?" (including the question mark). To use a literal backslash character, simply escape it as well.

The grammar for this full-text search language is specified in Section 6.7.19 of the JCR 2.0 specification, but it is also included here as a convenience.

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values.

Grammar

As you can see, this is a pretty simple and straightforward query language. But this language makes it extremely easy to find all the nodes in the repository that match a set of terms.

When using this query language, the QueryResult always contains the "jcr:path" and "jcr:score" columns.

JCR Query Object Model (JCR-QOM) API

JCR 2.0 introduces a new API for programmatically constructing a query. This API allows the client to construct the lower-level objects for each part of the query, and is a great fit for applications that would otherwise generate fairly complicated query expressions. Using this API is a matter of getting the QueryObjectModelFactory from the session's QueryManager, and using the factory to create the various components, starting with the lowest-level components. Then, these lower-level components can be passed to other factory methods to create the higher-level components, and so on, until finally the createQuery(...) method is called to return the QueryObjectModel.

Here is a simple example that shows how this is done for the simple query "SELECT * FROM [nt:unstructured] AS unstructNodes":

From this point on, processing the results is the same as when using the JCR Query API:

Of course, most queries will create the columns, orderings, and constraints using the QueryObjectModelFactory, whereas the example above just assumes all of the columns, no orderings, and no constraints.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.