Skip to end of metadata
Go to start of metadata

The JCR-SQL2 query language is defined by the JCR 2.0 specification as a way to express queries using strings that are similar to SQL. This query language is an improvement over the earlier JCR-SQL language, providing among other things far richer specifications of joins and criteria.

Extensions to JCR-SQL2

ModeShape includes full support for the complete JCR-SQL2 query language defined by the specification. However, ModeShape adds several extensions to make it even more powerful:

  • Support for the "FULL OUTER JOIN" and "CROSS JOIN" join types, in addition to the "LEFT OUTER JOIN", "RIGHT OUTER JOIN" and "INNER JOIN" types defined by JCR-SQL2. Note that "JOIN" is a shorthand for "INNER JOIN". For detail, see the grammar for joins.
  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.
  • Removal of duplicate rows in the results, using "SELECT DISTINCT ..." expression. For detail, see the grammar for queries.
  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped. For detail, see the grammar for limits and offsets.
  • Additional dynamic operands "DEPTH(selectorName)" and "PATH(selectorName)" that enable placing constraints on the node depth and path, respectively. These dynamic operands can be used in a manner similar to "NAME(selectorName)" and "LOCALNAME(selectorName)" that are defined by JCR-SQL2. Note in each of these cases, the selectorName is optional if there is only one selector in the query. For detail, see the grammar for dynamic operands.
  • Additional dynamic operand "REFERENCE(selectorName.propertyName)" and "REFERENCE(selectorName)" that enables placing constraints on one or any of the reference properties, respectively, and which can be used in a manner similar to the standard dynamic operand "PropertyValue(selectorName.propertyName)". Note in each of these cases, the selectorName is optional if there is only one selector in the query, and that the propertyName can be excluded if the constraint should apply to all reference properties. For detail, see the grammar for dynamic operands.
  • Support for the "IN" and "NOT IN" clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... [my:type].[prop1] IN (3,5,7,10,11,50) ...". For detail, see the grammar for set constraints.
  • Support for the "BETWEEN" clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... [my:type].[prop1] BETWEEN 3 EXCLUSIVE AND 10 ...". For detail, see the grammar for between constraints.
  • Support for simple arithmetic in numeric-based criteria and order-by clauses. For example, "... WHERE SCORE(type1) + SCORE(type2) > 1.0" or "... ORDER BY (SCORE(type1) * SCORE(type2)) ASC, LENGTH(type2.property1) DESC". For detail, see the grammar for order-by clauses.
  • Support for (non-correlated) subqueries in the WHERE clause, wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., "IN (...)"), then all of the subquery's rows will be used. For example, this expression "WHERE ... [my:type].[prop1] IN ( SELECT [my:prop2] FROM [my:type2] WHERE [my:prop3] < '1000' ) AND ..." will use the results of the subquery as the literal values in the IN clause. See the subqueries section for more information.
  • Support for several pseudo-columns ("jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth") that can be used in the SELECT, equijoin, and WHERE clauses. These pseudo-columns make it possible to return location-related and score information within the QueryResult's rows. They also make queries look more like SQL, and thus may be more friendly and easier to use in existing SQL-aware client applications. See the pseudo-columns section for more information.
  • Support for NOT LIKE as an operator in comparison criteria, and which is equivalent to wrapping a LIKE comparison criteria in a NOT(...) clause.
  • Support for RELIKE(...) operator that tests patterns stored in property values against a supplied string parameter.

Extended JCR-SQL2 Grammar

The rest of this section documents the full grammar for ModeShape's extended JCR-SQL2 support. Again, this grammar is a strict superset of that defined by the JCR 2.0 specification. In other words, ModeShape will support any JCR-SQL2 query that uses the standard grammar, but it will also support queries that make use of the ModeShape extensions.

Queries

The top-level rule for ModeShape's extended JCR-SQL2 grammar is QueryCommand, which consists of both Query and SetQuery:

ModeShape adds the concept of a set query, which is a query that performs a union, intersection, or complement of the results of two other queries. Set queries are common in SQL (which is essentially a set manipulation language) and are a very useful tool that would otherwise require significant processing of the results of multiple queries by the application. By supporting set queries, the application merely needs to declare that set operation be performed, and ModeShape will perform all the work before returning the results.

ModeShape also adds the ability to use "SELECT DISTINCT", which eliminates duplicate rows in a manner similar to SQL.

Source

A source is a named set of tuples, which in ModeShape corresponds to the nodes of a particular named node type. In other words, a source is equivalent to a table in a relational database. The available columns of a source are the named properties declared on the node type.

In the JCR-SQL2 grammar, a source is either a selector (a named node type) or a join specification:

See the Name rule below.

Joins

The JCR 2.0 specification does include joins in the standard JCR-SQL2 grammar, though the only defined types of joins included inner, left outer, and right outer joins. Because SQL also defines the useful full outer and cross join types, ModeShape adds support for these.

Each of the four kinds of join conditions are described below.

Equi-join condition

An equijoin is a join that uses only equality comparisons in the join predicate (or join condition). Using any other operators (e.g., '<' or '!=') in the join condition disqualifies a query from being an equi-join.

Therefore, the rules for the equi-join condition are as follows:

where the node type referenced by the selector identified in the query with the selector1Name must contain the property given by the property1Name literal, and similarly the node type referenced by the selector identified in the query with the selector2Name must contain the property given by the property2Name literal.

See also the name rule below.

Same-node join condition

An identity join is a special case of an equijoin, where the compared properties are node identifiers. Thus the join condition of an identity join constrains the node on one sides of the join to be the same node on the other side of the join. The standard JCR-SQL2 grammar defines a special function that makes this a little easier to use:

See also the path rule below.

Child-node join condition

A child-node join is one where the join condition constrains the node on the left side of the join to be a child of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:

Descendant-node join condition

A descendant-node join is one where the join condition constrains the node on the left side of the join to be a descendant of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:

Constraints

The Query rule defined above included a "WHERE" clause that can define multiple constraints on the nodes included in the results. The standard JCR-SQL2 grammar defined several such constraints, including and, or, not, comparison, property existence, full-text search, same-node, child-node, and descendant-node constraints. ModeShape supports all of these, but adds two others: between and set constraints.

Each of these types of constraints are described below.

And constraint

An and constraint stipulates that a node (or record or tuple) is included only if two other constraints are both true.

Or constraint

An or constraint stipulates that a node (or record or tuple) is included if either of two other constraints are true.

Not constraint

The not qualifier will negate another constraint, requiring that a node (or record or tuple) is included if the other constraint is not true.

Comparison constraint

A comparison constraint requires that the value for a node described by the dynamic operand on the left side of the operator is to be compared to a static literal value. The term "dynamic operand" is used in the JCR-SQL2 grammar because its value can only be determined during query evaluation.

The DynamicOperand and StaticOperand rules are defined below.

The behavior of the operators is dictated by the JCR 2.0 specification and matches how Value objects are compared:

  • If the DynamicOperand evaluates to null, the constraint is not satisfied.
  • If the '=" operator is used, the value that the DynamicOperand evaluates to must equal the StaticOperand value for the constraint to be satisfied.
  • If the '!=" operator is used, the value that the DynamicOperand evaluates to must not equal the StaticOperand value for the constraint to be satisfied.
  • If the '<" operator is used, the value that the DynamicOperand evaluates to must be less than the StaticOperand value for the constraint to be satisfied.
  • If the '<=" operator is used, the value that the DynamicOperand evaluates to must be less than or equal to the StaticOperand value for the constraint to be satisfied.
  • If the '>" operator is used, the value that the DynamicOperand evaluates to must be greater than the StaticOperand value for the constraint to be satisfied.
  • If the '>=" operator is used, the value that the DynamicOperand evaluates to must be greater than or equal to the StaticOperand value for the constraint to be satisfied.
  • If the "LIKE" operator is used, the constraint is only satisfied if the value that the DynamicOperand evaluates to match the pattern specified by the string literal StaticOperand, where in the pattern:
    • the character "%" matches zero or more characters, and
    • the character "{_}" (underscore) matches exactly one character, and
    • the string "\x" matches the character "x", and
    • all other characters match themselves
  • If the "NOT LIKE" operator is used, the constraint is only satisfied if the value that the DynamicOperand evaluates to not match the pattern specified by the string literal StaticOperand, where in the pattern:
    • the character "%" matches zero or more characters, and
    • the character "{_}" (underscore) matches exactly one character, and
    • the string "\x" matches the character "x", and
    • all other characters match themselves

Also, note that, unlike SQL, the standard JCR-SQL2 grammar does not allow the left-hand side and right-hand sides of a comparison constraint to be swapped.

Between constraint

The between constraint is one of the extensions defined by ModeShape, and allows a query to more easily represent a range of static values than using only the constraints available in the standard JCR-SQL2 grammar. The between constraint is based on the similar expression in SQL.

Again, the DynamicOperand and StaticOperand rules are defined below.

Property existence constraint

A property existence constraint stipulates that a property does indeed exist on a node that is of the node type specified by the named selector. ModeShape does allow the "NOT" qualifier to be excluded, which turns the constraint into a stipulation that the property does not exist on the node.

Set constraint

Like the between constraint, the set constraint is a ModeShape extension to the standard JCR-SQL2 grammar that allows what would normally be a complicated combination of standard JCR-SQL2 constraints to be more easily represented with a single, simple expression. Again, this constraint is patterned after the similar expression in SQL.

Note that multiple static operands can be included in the comma-separated list. The StaticOperand rule is defined below.

Although this rule seems complicated, it's actually very straightforward. The following query selects all the properties defined on the "acme:taggable" node type, returning only those "taggable" nodes with a "acme:tagname" value of "tag1", "tag2", "tag3", or "tag4":

Even this trivial query is quite a bit simpler and easier to understand than if the query had used only the constraints defined by the standard JCR-SQL2 grammar:

Imagine how complicated a query might be with multiple joins, multiple criteria, and many values to be compared for one or several different properties.

Full-text search constraint

The full-text search expression is a string literal that adheres to the full-text search grammar described below.

An example query selects all the properties defined on the "acme:taggable" node type, returning only those "taggable" nodes with a "acme:tagname" value that contains the "foo" term within the value:

Same-node constraint

The same-node constraint stipulates that the node appearing in the selector with the given name has a path that matches the literal path provided.

where the Path rule is defined below.

Because this standard constraint clause is not really like traditional SQL, ModeShape defines a 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE".

Child-node constraint

The child-node constraint stipulates that the node appearing in the selector with the given name is a child of a node with a path that matches the literal path provided.

See also ModeShape's 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE". And because the right hand side (i.e., static operand) of a "LIKE" expression can involve wildcards, it may be easier and more understandable to use the pseudo-column.

Descendant-node constraint

The descendant-node constraint stipulates that the node appearing in the selector with the given name is a descendant of a node with a path that matches the literal path provided.

See also ModeShape's 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE". And because the right hand side (i.e., static operand) of a "LIKE" expression can involve wildcards, it may be easier and more understandable to use the pseudo-column.

Reverse Like constraint

This is first available starting with ModeShape 3.6.

The standard JCR-SQL2 LIKE operator takes as input a fixed pattern and then attempts to find nodes whose value for the given property matches the pattern. However, sometimes the property values might actually store the patterns, and you want to find all property patterns that match a given fixed string parameter. The RELIKE (or "reverse like") makes this possible. To make this function more clear from the standard LIKE, the operands are reversed so that the pattern is the second parameter RELIKE.

For example, if the property of node "my:namePattern" on a node of type "my:country" contains a matching pattern of country name (e.g., "%Monaco%"), then we can find all "my:country" nodes that have a name pattern that matches "Principality of Monaco":

Path and name

Many of the rules above have used paths and names, and the rules for these are defined as follows:

Note that JCR-SQL2 surrounds identifiers with square brackets (e.g., '[' and ']'), allowing names to contain a ':' character needed with namespaced names. If the names or paths only contain valid SQL characters, then they do not need to be quoted.

Static operand

In the standard JCR-SQL2 grammar, a static operand appears on the right-hand side of an operator, and represents an expression whose value can be determined by static analysis of the query (e.g., when the query is parsed). In particular, a static operand in the standard JCR-SQL2 grammar comprised of either a literal value or a variable.

In SQL, however, the expression that appears on the right-hand side of an operator is not always able to be determined at query parse time. An example is a subquery, which appears on the right hand side but obviously can only be evaluated into values during query execution time. Since standard JCR-SQL2 does not include any such features, the term "static operand" is technically valid.

In addition to literal values and variables, ModeShape also supports subqueries appearing on the right-hand side of an operator. So this grammar continues to use the "static operand" term for easy comparison with the standard JCR-SQL2 grammar, but the term has a different (and expanded) semantic than in the standard grammar.

Therefore, the rules for what ModeShape allows on the right-hand side of an operator in a constraint is as follows:

where the grammar rules for BindVariableValue and Subquery are defined below.

Bind variable

The standard JCR-SQL2 grammar supports using variable names within a query, where the values for those variables are bound to the Query object before execution. In the query, the variable names are prefixed with a '$' character and are otherwise normal JCR name:

So, consider this simple query that selects all the properties defined on the "acme:taggable" node type, and that returns only those "taggable" nodes with a "acme:tagname" value that matches the value of the "tagValue" variable:

This query could be evaluated using the JCR API as follows:

Obviously multiple variables can be used in a query expression, but a value must be bound to every variable before the Query object can be executed.

Subquery

The standard JCR-SQL2 grammar does not support subqueries. But subqueries are such a useful feature, so ModeShape supports using multiple subqueries within a single query. In fact, subqueries are nothing more than a QueryCommand, which if you'll remember is the top-level rule in ModeShape's grammar. That means that subqueries can be any query, and you can even include subqueries within a subquery!

Strictly speaking, ModeShape only supports non-correlated subqueries, which means that they can actually be evaluated independently (outside the context of the containing query).

Additionally, because subqueries appear on the right-hand side of an operator, all subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., "IN (...)"), then all of the subquery's rows will be used.

For example, in the following query fragment, the first value in each row of the subquery's results will be used within the IN clause of the outer query:

However, changing the IN clause to a comparison results in only the first value in the first row of the subquery's results being using in the comparison criteria:

Dynamic operand

In various constraints described above, the dynamic operand appears on the left-hand side of an operator, and signifies that the values can only be determined when the query is evaluated.

The standard JCR-SQL2 grammar defines seven kinds of dynamic operands: property value, length, node name, node local name, full-text search score, lowercase, and uppercase.

ModeShape supports all these types, but adds support for four more: reference value, node path, node depth, and simple arithmetic clauses. ModeShape also allows the dynamic operand to be surrounded by parentheses, which is sometimes convenient for complex queries.

The DynamicOperand rule in ModeShape's extended grammar is:

Each of these types of dynamic operands is described in the following subsections.

Property value operand

The property value operand always evaluates to the value(s) of the specified property on the selector.

Note that if the property is multi-valued, the constraint will be satisfied if any of the property values works with the constraint. For example, if the 'acme:tagNames' property is a multi-valued property declared on the "acme:taggable" node type, then the following query will finds all "acme:taggable' nodes that has "foo" for at least one of the values of the 'acme:tagNames' property:

Reference value operand

One of ModeShape's extensions is to support the a "REFERENCE(...)" dynamic operand, which enables placing constraints on one or any of the reference properties.

The "REFERENCE" operand always evaluates to the identifier of the referenced nodes in one or all of the REFERENCE properties. Thus, all of the REFERENCE operands should be used with a StaticOperand that also evaluates to identifiers.

The "REFERENCE()" operand (with no selector name and no property name) evaluates to the identifiers of the nodes referenced by all of reference properties on the node in the only selector. The "REFERENCE(selectorName)" works the same way, but must be used if there is more than one selector in the query. Finally, the "REFERENCE(selectorName.propertyName)" evaluates to the identifiers of nodes referenced by the "propertyName" reference property on the nodes in the named selector.

For example, here is a query that finds all nodes that reference a set of nodes for which we already know the identifiers, "id1", "id2", and "id3".

This operand works really well with subqueries or variables for the right-hand side. For example, here is a query finds all nodes that reference any of the nodes in the subgraph below the "/foo/bar/baz" node, where a subquery is used to find all nodes in the subgraph:

This kind of query is impossible to do using standard JCR-SQL2 features, and shows some of the power of ModeShape's extensions to JCR-SQL2.

Length operand

The length operand evaluates to the length (or lengths, if multi-valued) of a property. The length is defined to be:

  • for a BINARY value, the number of bytes in the value, or
  • for all other value types, the number of characters of the string resulting from a conversion of the value to a string.

The rule for the length operand is:

where PropertyValue rule is defined above.

Node name operand

The node name operand always evaluates to the prefixed name of the node given by the supplied selector:

See also the 'jcr:name' pseudo-column, which enables accessing the JCR name of any node as if the name were a regular property on any node.

Node local name operand

The node name operand always evaluates to the local name of the node given by the supplied selector:

See also the 'mode:localName' pseudo-column, which enables accessing the local name of any node as if the local name were a regular property.

Node depth operand

The node depth operand is a ModeShape-specific extension to the standard set of dynamic operands, and evaluates to the integer depth of the node given by the supplied selector. The depth of a node is defined to be the number of segments in the node's path. For example, the depth of the root node is 0, whereas the depth of the node at "/foo/bar/baz" is 3.

See also the 'mode:depth' pseudo-column, which enables accessing the depth of any node as if the depth were a regular property.

Node path operand

The node path operand is a ModeShape-specific extension to the standard set of dynamic operands, and evaluates to the path of the node given by the supplied selector.

See also the 'jcr:path' pseudo-column, which enables accessing the path of any node as if the path were a regular property.

Full text search score operand

The full-text search score operand evaluates to a DOUBLE value equal to the full-text search score of a node. The full-text search score ranks a selector's nodes by their relevance to the 'fullTextSearchExpression' specified in a [FullTextSearch|#Fulltextsearchconstraint. The magnitude of the scores are implementation specific, but most implementations will produce higher scores with more relevant matching and lower scores for less-relevant matching.

See also the 'jcr:score' pseudo-column, which enables accessing the score of any node as if the score were a regular property.

Lowercase operand

The lowercase operand evaluates to the the lower-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.

Uppercase operand

The uppercase operand evaluates to the the upper-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.

Arithmetic operand

The arithmetic operand is a ModeShape-specific extension to the standard JCR-SQL2 grammar. It allows two other dynamic operands that evaluate to numeric values to be numerically combined using addition, subtraction, multiplication, or division.

For example, the following query restricts the results such that the sum of the score of nodes originating from separate selectors is greater than 1.0:

So although it's possible to use in the WHERE clause, it's more likely to be used in the order-by clauses. For example, the following query orders the results based upon the difference in the scores of nodes in the two selectors:

Ordering

The "ORDER BY" clause defined by the standard JCR-SQL2 grammar allows the order of the results to be dictated by the values evaluated at execution time based upon one or more dynamic operands. The rule for the expression is as follows:

As with SQL, the "ASC" qualifier specifies that the ordering should be in ascending order, and is the default; likewise, the "DESC" qualifier specifies that the ordering should be in descending order.

Columns

The standard JCR-SQL2 grammar allows a query to include in the "SELECT" clause which property values should be returned and included in the results:

When "*" is used for the list of selected columns, the result set is expected to minimally include, for each selector, a column for each single-valued non-residual property of the selector's node type, including those explicitly declared on the node type and those inherited from the node's supertypes.

For example, the result set for the following query would contain at least the '[jcr:primaryType]' column, since it is the only single-valued, non-residual property defined on the '[nt:base]' node type. The '[jcr:mixinTypes]' property is also non-residual, but the results need not include it since it's multi-valued.

If there are multiple selectors, then "SELECT *" will include all of the selectable columns from each selector's node type. However, it's possible to request all of the selectable columns from some of the selectors, using the form. For example:

Note, however, that although only single-valued, non-residual properties are included when "*" is used in the SELECT clause, it is possible to explicitly include residual properties. For example, the following query finds all nodes that have at least one "foo" value for the 'acme:tagNames' property:

Limit and offset

Neither the standard JCR-SQL2 grammar or the JCR API itself provide support for limiting the rows that are returned in the results. This is a common need, especially for applications that paginate the results, where each page shows a subset of the results.

Because this is such an essential feature that can't be accomplished any other way, ModeShape adds support for specifying the maximum number of rows to return, and optionally specifying the number of initial rows that should be skipped. The ModeShape extension is follows the SQL syntax:

The LIMIT clause is entirely optional, and if absent does not limit the result set rows in any way. However, if the "LIMIT count" clause is used, then the result set will contain no more than count rows. This LIMIT clause may optionally be followed by the "OFFSET number" clause, where number is the number of initial rows that should be skipped before the rows are included in the results.

Pseudo-columns

The design of the JCR-SQL2 query language makes fairly heavy use of functions, including SCORE(), NAME(), LOCALNAME(), and various constraints. ModeShape adds several more useful functions, such as PATH() and DEPTH(), that follow the same patterns.

However, these functions have several disadvantages. First, they make the JCR-SQL2 language less "SQL-like", since SQL-92 and -99 don't define similar kinds of functions. (There are aggregate functions, like COUNT, SUM, etc., but they operate on a particular column in all tuples and are therefore more dissimilar than similar.) This means that applications that use SQL and SQL-like query languages are less likely to be able to build and issue JCR-SQL2 queries.

A second disadvantage of these functions is that JCR-SQL2 does not allow them to be used within the SELECT clause. As a result, the location-related and score information cannot be included as columns of values in the QueryResult rows. Instead, a client can only access this information by obtaining the Node object(s) for each row. Relying upon both the result set and additional Java objects makes it difficult to use the JCR query system. It also makes certain kinds of applications impossible.

For example, ModeShape's JDBC driver is designed to enable JDBC-aware applications to query repository content using JCR-SQL2 queries. The standard JDBC API cannot expose the Node objects, so the only way to return the path-related and score information is through additional columns in the result. While such columns could always "magically" appear in the result set, doing this is not compatible with JDBC applications that dynamically build the SELECT clauses of queries based upon database metadata. Such applications require the columns to be properly described in database metadata, and the columns need to be used within queries.

ModeShape attempts to solve these issues by directly supporting a number of "pseudo-columns" within JCR-SQL2 queries, wherever columns can be used. These "pseudo-columns" include:

  • jcr:score is a column of type DOUBLE that represents the full-text search score of the node, which is a measure of the node's relevance to the full-text search expression. ModeShape does compute the scores for all queries, though the score for rows in queries that do not include a full-text search criteria may not be reliable.
  • jcr:path is a column of type PATH that represents the normalized path of a node, including same-name siblings. This is the same as what would be returned by the getPath() method of Node. Examples of paths include "/jcr:system" and "/foo/bar[3]".
  • jcr:name is a column of type NAME that represents the node name in its namespace-qualified form using namespace prefixes and excluding same-name-sibling indexes. Examples of node names include "jcr:system", "jcr:content", "ex:UserData", and "bar".
  • mode:localName is a column of type STRING that represents the local name of the node, which excludes the namespace prefix and same-name-sibling index. As an example, the local name of the "jcr:system" node is "system", while the local name of the "ex:UserData[3]" node is "UserData".
  • mode:depth is a column of type LONG that represents the depth of a node, which corresponds exactly to the number of path segments within the path. For example, the depth of the root node is 0, whereas the depth of the "/jcr:system/jcr:nodeTypes" node is 2.

All of these pseudo-columns can be used in the SELECT clause of any JCR-SQL2 query, and their use defines whether such columns appear in the result set. In fact, all of these pseudo-columns will be included when "SELECT *" clauses in JCR-SQL2 queries are expanded by the query engine. This means that every node type (even mixin node types that have no properties and are essentially markers) are represented by a queryable table with at least one column. However, unlike the older JCR-SQL query language, these pseudo-columns are never included in the result unless explicitly included or implicitly included with the "SELECT *" clause.

Why did ModeShape use the "jcr" namespace prefix for some of the pseudo-columns, and "mode" for the others? The older JCR-SQL language defined the "jcr:score", "jcr:path", and "jcr:name" pseudo-columns, so we just use the same names. The other columns were unique to ModeShape and are therefore defined with the "mode" namespace prefix.

Like any other column, all of these pseudo-columns can be also be used in the WHERE clause of any JCR-SQL2 query, even if they are not included in the SELECT clause. They can be used anywhere that a regular column can be used, including within constraints and dynamic operands. ModeShape will automatically rewrite queries that use pseudo-columns in the dynamic operands of constraints to use the corresponding function, such as SCORE(), PATH(), NAME(), LOCALNAME(), and DEPTH(). Additionally, any property existence constraint using these pseudo-columns will always evaluate to 'true' (and thus ModeShape's query optimizer will always remove such constraints from the query plan).

The "jcr:path" pseudo-column may also be used on both sides of an equijoin constraint clause. For example, equijoin expressions similar to:

will be automatically rewritten by ModeShape's optimizer to the following form:

As with regular columns, the pseudo-columns must be qualified with the selector name if the query contains more than one selector.

Full-text search grammar

The grammar for the full-text search expressions used in the JCR-SQL2's full-text search constraint is as follows:

This grammar supports expressions similar to what you might provide to an internet search engine. Essentially, it simply lists the terms or phrases that should appear (or not appear) in the applicable property value(s). Simple terms consist of a single word (with only non-space characters), while phrases can simply be surrounded with double quotes.

Example JCR-SQL2 queries

This section walks through several JCR-SQL2 example queries, describing what each one does and, in some cases, providing alternative queries.

The basics

One of the simplest JCR-SQL2 queries finds all nodes in the current workspace of the repository:

This query will return a result set containing the "jcr:primaryType" column, since the nt:base node type defines only one single-valued, non-residual property called "jcr:primaryType".

ModeShape does not currently support returning multi-valued properties in result sets. This is permitted by the JCR 2.0 specification. ModeShape does, however, support using multi-valued properties in constraints and ORDER BY clauses.

Since our query used "SELECT *", ModeShape also includes the five non-standard pseudo-columns mentioned above: "jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth". These columns are very convenient to have in the results, but also make certain criteria much easier than with the corresponding standard or ModeShape-specific functions.

Queries can explicitly specify the columns that are to be returned in the results. The following query is very similar to the previous query and will return the same rows, but the result set will have only a single column and will not include any of the pseudo-columns:

The following query will return the same rows as in the previous two queries, but the SELECT clause explicitly includes only two of the pseudo-columns for the path and depth (which are computed from the nodes' locations):

In JCR-SQL2, a table representing a particular node type will have a column for each of the node type's property definitions, including those inherited from supertypes. For example, the nt:file node type, its nt:hierarchyNode supertype, and the mix:created mixin type are defined using the CND notation as follows:

Therefore, the table representing the nt:file node type will have 3 columns: the "jcr:created" and "jcr:createdBy" columns inherited from the mix:created mixin node type (via the nt:hierarchyNode node type), and the "jcr:primaryType" column inherited from the nt:base node type, which is the implicit supertype of the nt:hierarchyNode (and all node types).

ModeShape adheres to this behavior with the exception that a "SELECT *" will result in the additional pseudo-columns. Thus, this next query:

is equivalent to this query:

Using columns in constraints

Next, let's look at a query that selects some of the available columns from the nt:file table and uses a constraint to ensure the resulting file nodes have names that end in '.txt':

ModeShape also supports placing criteria against the mode:localName pseudo-column instead of using the LOCALNAME() function. Such a query is equivalent to the previous query and will produce the exact same results:

ModeShape's pseudo-columns are often far easier to use than the corresponding function-like constraints.

Although this query looks much more like SQL, the use of the '[' and ']' characters to quote the identifiers is not typical of a SQL dialect. ModeShape actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous two queries:

Inner joins

In JCR-SQL2, a node will appear as a row in each table that corresponds to the node types defined by that node's primary type or mixin types, or any supertypes of these node types. In other words, a node will appear in the table corresponding to each node type for which Node.isNodeType(...) returns true.

For example, consider a node that has a primary type of nt:file but has an explicit mixin of mix:referenceable. This node will appear as a row in the all of these tables:

  • nt:file
  • mix:referenceable
  • nt:hierarchyNode
  • mix:created
  • nt:base

However, the columns in each of these tables will differ. The nt:file node type has the nt:hierarchyNode, mix:created, and nt:base for supertypes, and therefore the table for nt:file contains columns for the property definitions on all of these types. But because mix:referenceable is not a supertype of nt:file, the table for nt:file will not contain a jcr:uuid column. To obtain a single result set that contains columns for all the properties of our node, we need to perform an identity join.

The next query shows how to return all properties for nt:file nodes that are also mix:referenceable:

Since wildcards were used in the SELECT clause, ModeShape expands the SELECT clause to include the columns for all (explicit and inherited) property definitions of each type plus pseudo-columns for each type, which is equivalent to:

Note because we are using an identity join, the "file.[jcr:path]" column will contain the same value as the "ref.[jcr:path]".

Fully-expand the SELECT clause to specify exactly the columns that you want, excluding the columns that return the same values or return values not needed by your application. This can also make the query a bit more efficient, since less data needs to be found and returned.

By the way, this is also what many well-written applications do when querying SQL databases.

Here is a query that does this by eliminating columns with duplicate values and using aliases that are simpler than the namespace-qualified names:

Although this query looks much more like SQL, use of the '[' and ']' characters in JCR-SQL2 to quote the identifiers is not typical of a SQL dialect. Again, ModeShape supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). This makes it easier for existing SQL-oriented tools and applications to work more readily with ModeShape, including applications that use ModeShape's JDBC driver to query a ModeShape JCR repository.

This next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous query. However, it uses double quotes and a pseudo-column identity constraint on "jcr:path" (which is identical in semantics and performance as the "ISSAMENODE(...)" constraint):

When using joins and selecting multiple columns, use aliases on the columns to make it easier to reference those columns in constraints and ordering clauses.

Other joins

These are examples of two-way inner joins, but ModeShape supports joining multiple tables together in a single query. ModeShape also supports a variety of joins, including:

  • INNER JOIN (or just JOIN)
  • LEFT OUTER JOIN
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN
  • CROSS JOIN

Set operations: unions, intersections, and complements

ModeShape also supports several other query features beyond JCR-SQL2. One of these is support for set queries that use:

  • UNION and UNION ALL
  • INTERSECT and INTERSECT ALL
  • EXCEPT and EXCEPT ALL.

Here is an example of a union:

Subqueries

ModeShape also supports using (non-correlated) subqueries within the WHERE clause and wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries, though, should return a single column (all other columns will be ignored), and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single row (e.g., in a comparison), only the subquery's first row will be used.

Subqueries in ModeShape are a powerful and easy way to use more complex criteria that is a function of the content in the repository, without having to resort to multiple queries and complex application logic, such as taking the results of one query and dynamically generating the criteria of another query.

Here's an example of a query that finds all nt:file nodes in the repository whose paths are referenced in the value of the vdb:originalFile property of the vdb:virtualDatabase nodes. (This query also uses the "$maxVersion" variable in the subquery.)

Without subqueries, this query would need to be broken into two separate queries: the first would find all of the paths referenced by the vdb:virtualDatabase nodes matching the version and description criteria, followed by one (or more) subsequent queries to find the nt:file nodes with the paths expressed as literal values (or variables).

Using a subquery is not only easier to implement and understand, it is actually more efficient.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.