Extended JCR-SQL2 Grammar
The rest of this section documents the full grammar for ModeShape's extended JCR-SQL2 support. Again, this grammar is a strict superset of that defined by the JCR 2.0 specification. In other words, ModeShape will support any JCR-SQL2 query that uses the standard grammar, but it will also support queries that make use of the ModeShape extensions.
Queries
The top-level rule for ModeShape's extended JCR-SQL2 grammar is QueryCommand, which consists of both Query and SetQuery:
QueryCommand ::= Query | SetQuery
SetQuery ::= Query ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query
{ ('UNION'|'INTERSECT'|'EXCEPT') ['ALL'] Query }
Query ::= 'SELECT' ['DISTINCT'] columns
'FROM' Source
['WHERE' Constraint]
['ORDER BY' orderings]
[Limit]
ModeShape adds the concept of a set query, which is a query that performs a union, intersection, or complement of the results of two other queries. Set queries are common in SQL (which is essentially a set manipulation language) and are a very useful tool that would otherwise require significant processing of the results of multiple queries by the application. By supporting set queries, the application merely needs to declare that set operation be performed, and ModeShape will perform all the work before returning the results.
ModeShape also adds the ability to use "SELECT DISTINCT", which eliminates duplicate rows in a manner similar to SQL.
Source
A source is a named set of tuples, which in ModeShape corresponds to the nodes of a particular named node type. In other words, a source is equivalent to a table in a relational database. The available columns of a source are the named properties declared on the node type.
In the JCR-SQL2 grammar, a source is either a selector (a named node type) or a join specification:
Source ::= Selector | Join
Selector ::= nodeTypeName ['AS' selectorName]
nodeTypeName ::= Name
selectorName ::= /* A string that contains only SQL-legal characters,
and which can be used elsewhere in the query to
refer to the selector. */
See the Name rule below.
Joins
The JCR 2.0 specification does include joins in the standard JCR-SQL2 grammar, though the only defined types of joins included inner, left outer, and right outer joins. Because SQL also defines the useful full outer and cross join types, ModeShape adds support for these.
Join ::= left [JoinType] 'JOIN' right 'ON' JoinCondition
/* If JoinType is omitted INNER is assumed. */
left ::= Source
right ::= Source
JoinType ::= Inner | LeftOuter | RightOuter | FullOuter | Cross
Inner ::= 'INNER' ['JOIN']
LeftOuter ::= 'LEFT JOIN' | 'OUTER JOIN' | 'LEFT OUTER JOIN'
RightOuter ::= 'RIGHT OUTER' ['JOIN']
RightOuter ::= 'FULL OUTER' ['JOIN']
RightOuter ::= 'CROSS' ['JOIN']
JoinCondition ::= EquiJoinCondition |
SameNodeJoinCondition |
ChildNodeJoinCondition |
DescendantNodeJoinCondition
Each of the four kinds of join conditions are described below.
Equi-join condition
An equijoin is a join that uses only equality comparisons in the join predicate (or join condition). Using any other operators (e.g., '<' or '!=') in the join condition disqualifies a query from being an equi-join.
Therefore, the rules for the equi-join condition are as follows:
EquiJoinCondition ::= selector1Name'.'property1Name
'=' selector2Name'.'property2Name
selector1Name ::= selectorName
selector2Name ::= selectorName
property1Name ::= propertyName
property2Name ::= propertyName
propertyName ::= Name
where the node type referenced by the selector identified in the query with the selector1Name must contain the property given by the property1Name literal, and similarly the node type referenced by the selector identified in the query with the selector2Name must contain the property given by the property2Name literal.
See also the name rule below.
Same-node join condition
An identity join is a special case of an equijoin, where the compared properties are node identifiers. Thus the join condition of an identity join constrains the node on one sides of the join to be the same node on the other side of the join. The standard JCR-SQL2 grammar defines a special function that makes this a little easier to use:
SameNodeJoinCondition ::= 'ISSAMENODE(' selector1Name
',' selector2Name
[',' selector2Path] ')'
selector1Name ::= selectorName
selector2Name ::= selectorName
selector2Path ::= Path
See also the path rule below.
Child-node join condition
A child-node join is one where the join condition constrains the node on the left side of the join to be a child of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:
ChildNodeJoinCondition ::= 'ISCHILDNODE('
childSelectorName ','
parentSelectorName ')'
childSelectorName ::= selectorName
parentSelectorName ::= selectorName
Descendant-node join condition
A descendant-node join is one where the join condition constrains the node on the left side of the join to be a descendant of the node on the right side of the join. The standard JCR-SQL2 grammar defines a special function that makes it easier to specify such join conditions:
DescendantNodeJoinCondition ::= 'ISDESCENDANTNODE('
descendantSelectorName ','
ancestorSelectorName ')'
descendantSelectorName ::= selectorName
ancestorSelectorName ::= selectorName
Constraints
The Query rule defined above included a "WHERE" clause that can define multiple constraints on the nodes included in the results. The standard JCR-SQL2 grammar defined several such constraints, including and, or, not, comparison, property existence, full-text search, same-node, child-node, and descendant-node constraints. ModeShape supports all of these, but adds two others: between and set constraints.
Constraint ::= ConstraintItem | '(' ConstraintItem ')'
ConstraintItem ::= And | Or | Not
Comparison | Between |
PropertyExistence |
SetConstraint |
FullTextSearch |
SameNode |
ChildNode |
DescendantNode |
Relike
Each of these types of constraints are described below.
And constraint
An and constraint stipulates that a node (or record or tuple) is included only if two other constraints are both true.
And ::= constraint1 'AND' constraint2
constraint1 ::= Constraint
constraint2 ::= Constraint
Or constraint
An or constraint stipulates that a node (or record or tuple) is included if either of two other constraints are true.
Or ::= constraint1 'OR' constraint2
constraint1 ::= Constraint
constraint2 ::= Constraint
Not constraint
The not qualifier will negate another constraint, requiring that a node (or record or tuple) is included if the other constraint is not true.
Not ::= 'NOT' constraint
constraint ::= Constraint
Comparison constraint
A comparison constraint requires that the value for a node described by the dynamic operand on the left side of the operator is to be compared to a static literal value. The term "dynamic operand" is used in the JCR-SQL2 grammar because its value can only be determined during query evaluation.
Comparison ::= DynamicOperand Operator StaticOperand
Operator ::= '=' | '!=' | '<' | '<=' | '>' | '>=' | 'LIKE' | 'NOT LIKE'
The DynamicOperand and StaticOperand rules are defined below.
The behavior of the operators is dictated by the JCR 2.0 specification and matches how Value objects are compared:
-
If the DynamicOperand evaluates to null, the constraint is not satisfied.
-
If the '=" operator is used, the value that the DynamicOperand evaluates to must equal the StaticOperand value for the constraint to be satisfied.
-
If the '!=" operator is used, the value that the DynamicOperand evaluates to must not equal the StaticOperand value for the constraint to be satisfied.
-
If the '<" operator is used, the value that the DynamicOperand evaluates to must be less than the StaticOperand value for the constraint to be satisfied.
-
If the '<=" operator is used, the value that the DynamicOperand evaluates to must be less than or equal to the StaticOperand value for the constraint to be satisfied.
-
If the '>" operator is used, the value that the DynamicOperand evaluates to must be greater than the StaticOperand value for the constraint to be satisfied.
-
If the '>=" operator is used, the value that the DynamicOperand evaluates to must be greater than or equal to the StaticOperand value for the constraint to be satisfied.
-
If the "LIKE" operator is used, the constraint is only satisfied if the value that the DynamicOperand evaluates to match the pattern specified by the string literal StaticOperand, where in the pattern:
-
the character "%" matches zero or more characters, and
-
the character "_" (underscore) matches exactly one character, and
-
the string "\x" matches the character "x", and
-
all other characters match themselves
-
If the "NOT LIKE" operator is used, the constraint is only satisfied if the value that the DynamicOperand evaluates to not match the pattern specified by the string literal StaticOperand, where in the pattern:
-
the character "%" matches zero or more characters, and
-
the character "_" (underscore) matches exactly one character, and
-
the string "\x" matches the character "x", and
-
all other characters match themselves
Also, note that, unlike SQL, the standard JCR-SQL2 grammar does not allow the left-hand side and right-hand sides of a comparison constraint to be swapped.
Between constraint
The between constraint is one of the extensions defined by ModeShape, and allows a query to more easily represent a range of static values than using only the constraints available in the standard JCR-SQL2 grammar. The between constraint is based on the similar expression in SQL.
Between ::= DynamicOperand ['NOT'] 'BETWEEN'
lowerBound ['EXCLUSIVE'] 'AND'
upperBound ['EXCLUSIVE']
lowerBound ::= StaticOperand
upperBound ::= StaticOperand
Again, the DynamicOperand and StaticOperand rules are defined below.
Property existence constraint
A property existence constraint stipulates that a property does indeed exist on a node that is of the node type specified by the named selector. ModeShape does allow the "NOT" qualifier to be excluded, which turns the constraint into a stipulation that the property does not exist on the node.
PropertyExistence ::= [selectorName'.']propertyName 'IS' ['NOT'] 'NULL'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
Set constraint
Like the between constraint, the set constraint is a ModeShape extension to the standard JCR-SQL2 grammar that allows what would normally be a complicated combination of standard JCR-SQL2 constraints to be more easily represented with a single, simple expression. Again, this constraint is patterned after the similar expression in SQL.
SetConstraint ::= [selectorName '.']propertyName ['NOT'] 'IN'
'(' firstStaticOperand
{',' additionalStaticOperand }
')'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
firstStaticOperand ::= StaticOperand
additionalStaticOperand ::= StaticOperand
Note that multiple static operands can be included in the comma-separated list. The StaticOperand rule is defined below.
Although this rule seems complicated, it's actually very straightforward. The following query selects all the properties defined on the "acme:taggable" node type, returning only those "taggable" nodes with a "acme:tagname" value of "tag1", "tag2", "tag3", or "tag4":
SELECT * FROM [acme:taggable] as tagged
WHERE tagged.[acme:tagName] IN ('tag1','tag2','tag3','tag4')
Even this trivial query is quite a bit simpler and easier to understand than if the query had used only the constraints defined by the standard JCR-SQL2 grammar:
SELECT * FROM [acme:taggable] as tagged
WHERE tagged.[acme:tagName] = 'tag1'
OR tagged.[acme:tagName] = 'tag2'
OR tagged.[acme:tagName] = 'tag3'
OR tagged.[acme:tagName] = 'tag4'
Imagine how complicated a query might be with multiple joins, multiple criteria, and many values to be compared for one or several different properties.
Full-text search constraint
FullTextSearch ::= 'CONTAINS('
([selectorName'.']propertyName | selectorName'.*')
',' ''' fullTextSearchExpression''' ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
fullTextSearchExpression ::= FulltextSearch
The full-text search expression is a string literal that adheres to the full-text search grammar described below.
An example query selects all the properties defined on the "acme:taggable" node type, returning only those "taggable" nodes with a "acme:tagname" value that contains the "foo" term within the value:
SELECT * FROM [acme:taggable] as tagged
WHERE CONTAINS(tagged.[acme:tagName],'foo')
Same-node constraint
The same-node constraint stipulates that the node appearing in the selector with the given name has a path that matches the literal path provided.
SameNode ::= 'ISSAMENODE(' [selectorName ','] Path ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
where the Path rule is defined below.
Because this standard constraint clause is not really like traditional SQL, ModeShape defines a 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE".
Child-node constraint
The child-node constraint stipulates that the node appearing in the selector with the given name is a child of a node with a path that matches the literal path provided.
ChildNode ::= 'ISCHILDNODE(' [selectorName ','] Path ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
See also ModeShape's 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE". And because the right hand side (i.e., static operand) of a "LIKE" expression can involve wildcards, it may be easier and more understandable to use the pseudo-column.
Descendant-node constraint
The descendant-node constraint stipulates that the node appearing in the selector with the given name is a descendant of a node with a path that matches the literal path provided.
DescendantNode ::= 'ISDESCENDANTNODE(' [selectorName ','] Path ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional */
See also ModeShape's 'jcr:path' pseudo-column that can be used in comparison constraints and that allows for using other comparison operators, including "LIKE". And because the right hand side (i.e., static operand) of a "LIKE" expression can involve wildcards, it may be easier and more understandable to use the pseudo-column.
Reverse Like constraint
This is first available starting with ModeShape 3.6.
The standard JCR-SQL2 LIKE operator takes as input a fixed pattern and then attempts to find nodes whose value for the given property matches the pattern. However, sometimes the property values might actually store the patterns, and you want to find all property patterns that match a given fixed string parameter. The RELIKE (or "reverse like") makes this possible. To make this function more clear from the standard LIKE, the operands are reversed so that the pattern is the second parameter RELIKE.
For example, if the property of node "my:namePattern" on a node of type "my:country" contains a matching pattern of country name (e.g., "%Monaco%"), then we can find all "my:country" nodes that have a name pattern that matches "Principality of Monaco":
SELECT *
FROM [my:country] AS country
WHERE RELIKE("Principality of Monaco", country.[my:namePattern]);
Path and name
Many of the rules above have used paths and names, and the rules for these are defined as follows:
Name ::= '[' quotedName ']' | '[' simpleName ']' | simpleName
quotedName ::= /* A JCR Name (see the JCR specification) */
simpleName ::= /* A JCR Name that contains only SQL-legal
characters (namely letters, digits, and underscore) */
Path ::= '[' quotedPath ']' | '[' simplePath ']' | simplePath
quotedPath ::= /* A JCR Path that contains non-SQL-legal characters */
simplePath ::= /* A JCR Path (rather Name) that contains only SQL-legal
characters (namely letters, digits, and underscore) */
Note that JCR-SQL2 surrounds identifiers with square brackets (e.g., '[' and ']'), allowing names to contain a ':' character needed with namespaced names. If the names or paths only contain valid SQL characters, then they do not need to be quoted.
Static operand
In the standard JCR-SQL2 grammar, a static operand appears on the right-hand side of an operator, and represents an expression whose value can be determined by static analysis of the query (e.g., when the query is parsed). In particular, a static operand in the standard JCR-SQL2 grammar comprised of either a literal value or a variable.
In SQL, however, the expression that appears on the right-hand side of an operator is not always able to be determined at query parse time. An example is a subquery, which appears on the right hand side but obviously can only be evaluated into values during query execution time. Since standard JCR-SQL2 does not include any such features, the term "static operand" is technically valid.
In addition to literal values and variables, ModeShape also supports subqueries appearing on the right-hand side of an operator. So this grammar continues to use the "static operand" term for easy comparison with the standard JCR-SQL2 grammar, but the term has a different (and expanded) semantic than in the standard grammar.
Therefore, the rules for what ModeShape allows on the right-hand side of an operator in a constraint is as follows:
StaticOperand ::= Literal | BindVariableValue | Subquery
Literal ::= CastLiteral | UncastLiteral
CastLiteral ::= 'CAST(' UncastLiteral ' AS ' PropertyType ')'
PropertyType ::= 'STRING' | 'BINARY' | 'DATE' |
'LONG' | 'DOUBLE' | 'DECIMAL' |
'BOOLEAN' | 'NAME' | 'PATH' |
'REFERENCE' | 'WEAKREFERENCE' |
'URI'
UncastLiteral ::= UnquotedLiteral |
''' UnquotedLiteral ''' |
'"' UnquotedLiteral '"'
UnquotedLiteral ::= /* String form of a JCR Value,
as defined in the JCR specification */
where the grammar rules for BindVariableValue and Subquery are defined below.
Bind variable
The standard JCR-SQL2 grammar supports using variable names within a query, where the values for those variables are bound to the Query object before execution. In the query, the variable names are prefixed with a '$' character and are otherwise normal JCR name:
BindVariableValue ::= '$'bindVariableName
bindVariableName ::= /* A string that conforms to the JCR Name syntax,
though the prefix does not need to be a
registered namespace prefix. */
So, consider this simple query that selects all the properties defined on the "acme:taggable" node type, and that returns only those "taggable" nodes with a "acme:tagname" value that matches the value of the "tagValue" variable:
SELECT * FROM [acme:taggable] as tagged
WHERE tagged.[acme:tagName] = $tagValue
This query could be evaluated using the JCR API as follows:
// Obtain the query manager for the session via the workspace ...
javax.jcr.Session session = // ...
javax.jcr.query.QueryManager mgr = session.getWorkspace().getQueryManager();
// Create a query object ...
String language = ...
String expression = ...
javax.jcr.query.Query query = queryManager.createQuery(expression,language);
// Bind a value to the variable ...
Value tag = session.getValueFactory().create("foo");
query.bindVariable("tagValue",tag);
// Execute the query and get the results ...
javax.jcr.query.QueryResult result = query.execute();
Obviously multiple variables can be used in a query expression, but a value must be bound to every variable before the Query object can be executed.
Subquery
The standard JCR-SQL2 grammar does not support subqueries. But subqueries are such a useful feature, so ModeShape supports using multiple subqueries within a single query. In fact, subqueries are nothing more than a QueryCommand, which if you'll remember is the top-level rule in ModeShape's grammar. That means that subqueries can be any query, and you can even include subqueries within a subquery!
Subquery ::= '(' QueryCommand ')' |
QueryCommand
Strictly speaking, ModeShape only supports non-correlated subqueries, which means that they can actually be evaluated independently (outside the context of the containing query).
Additionally, because subqueries appear on the right-hand side of an operator, all subqueries must return a single column, and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single value (e.g., in a comparison), only the subquery's first row will be used. If the subquery is used in a clause that allows multiple values (e.g., "IN (...)"), then all of the subquery's rows will be used.
For example, in the following query fragment, the first value in each row of the subquery's results will be used within the IN clause of the outer query:
WHERE ... [my:type].[prop1] IN
( SELECT [my:prop2] FROM [my:type2]
WHERE [my:prop3] < '1000' )
AND ...
However, changing the IN clause to a comparison results in only the first value in the first row of the subquery's results being using in the comparison criteria:
WHERE ... [my:type].[prop1] =
( SELECT [my:prop2] FROM [my:type2]
WHERE [my:prop3] < '1000' )
AND ...
Dynamic operand
In various constraints described above, the dynamic operand appears on the left-hand side of an operator, and signifies that the values can only be determined when the query is evaluated.
The standard JCR-SQL2 grammar defines seven kinds of dynamic operands: property value, length, node name, node local name, full-text search score, lowercase, and uppercase.
ModeShape supports all these types, but adds support for four more: reference value, node path, node depth, and simple arithmetic clauses. ModeShape also allows the dynamic operand to be surrounded by parentheses, which is sometimes convenient for complex queries.
The DynamicOperand rule in ModeShape's extended grammar is:
DynamicOperand ::= PropertyValue | ReferenceValue |
Length | NodeName | NodeLocalName |
NodePath | NodeDepth | ChildCount |
FullTextSearchScore |
LowerCase | UpperCase |
Arithmetic |
'(' DynamicOperand ')'
Each of these types of dynamic operands is described in the following subsections.
Property value operand
The property value operand always evaluates to the value(s) of the specified property on the selector.
PropertyValue ::= [selectorName'.'] propertyName
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional. */
Note that if the property is multi-valued, the constraint will be satisfied if any of the property values works with the constraint. For example, if the 'acme:tagNames' property is a multi-valued property declared on the "acme:taggable" node type, then the following query will finds all "acme:taggable' nodes that has "foo" for at least one of the values of the 'acme:tagNames' property:
SELECT * FROM [acme:taggable] as tagged
WHERE tagged.[acme:tagNames] = 'foo'
Reference value operand
One of ModeShape's extensions is to support the a "REFERENCE(...)" dynamic operand, which enables placing constraints on one or any of the reference properties.
ReferenceValue ::= 'REFERENCE(' selectorName '.' propertyName ')' |
'REFERENCE(' selectorName ')' |
'REFERENCE()' |
/* If only one selector exists in this query,
explicit specification of the selectorName
preceding the propertyName is optional.
Also, the property name may be excluded
if the constraint should apply to any
reference property.*/
The "REFERENCE" operand always evaluates to the identifier of the referenced nodes in one or all of the REFERENCE properties. Thus, all of the REFERENCE operands should be used with a StaticOperand that also evaluates to identifiers.
The "REFERENCE()" operand (with no selector name and no property name) evaluates to the identifiers of the nodes referenced by all of reference properties on the node in the only selector. The "REFERENCE(selectorName)" works the same way, but must be used if there is more than one selector in the query. Finally, the "REFERENCE(selectorName.propertyName)" evaluates to the identifiers of nodes referenced by the "propertyName" reference property on the nodes in the named selector.
For example, here is a query that finds all nodes that reference a set of nodes for which we already know the identifiers, "id1", "id2", and "id3".
SELECT * FROM [nt:base]
WHERE REFERENCE() IN ('id1','id2','id3')
This operand works really well with subqueries or variables for the right-hand side. For example, here is a query finds all nodes that reference any of the nodes in the subgraph below the "/foo/bar/baz" node, where a subquery is used to find all nodes in the subgraph:
SELECT * FROM [nt:base]
WHERE REFERENCE() IN (
SELECT [jcr:uuid] FROM [nt:base] AS refd
WHERE ISDESCENDANTNODE(refd,'/foo/bar/baz')
)
This kind of query is impossible to do using standard JCR-SQL2 features, and shows some of the power of ModeShape's extensions to JCR-SQL2.
Length operand
The length operand evaluates to the length (or lengths, if multi-valued) of a property. The length is defined to be:
-
for a BINARY value, the number of bytes in the value, or
-
for all other value types, the number of characters of the string resulting from a conversion of the value to a string.
The rule for the length operand is:
Length ::= 'LENGTH(' PropertyValue ')'
where PropertyValue rule is defined above.
Node name operand
The node name operand always evaluates to the prefixed name of the node given by the supplied selector:
NodeName ::= 'NAME(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
See also the 'jcr:name' pseudo-column, which enables accessing the JCR name of any node as if the name were a regular property on any node.
Node local name operand
The node name operand always evaluates to the local name of the node given by the supplied selector:
NodeLocalName ::= 'LOCALNAME(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
See also the 'mode:localName' pseudo-column, which enables accessing the local name of any node as if the local name were a regular property.
Node depth operand
The node depth operand is a ModeShape-specific extension to the standard set of dynamic operands, and evaluates to the integer depth of the node given by the supplied selector. The depth of a node is defined to be the number of segments in the node's path. For example, the depth of the root node is 0, whereas the depth of the node at "/foo/bar/baz" is 3.
NodeDepth ::= 'DEPTH(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
See also the 'mode:depth' pseudo-column, which enables accessing the depth of any node as if the depth were a regular property.
Node path operand
The node path operand is a ModeShape-specific extension to the standard set of dynamic operands, and evaluates to the path of the node given by the supplied selector.
NodePath ::= 'PATH(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
See also the 'jcr:path' pseudo-column, which enables accessing the path of any node as if the path were a regular property.
Child count operand
The child count operand is a ModeShape-specific extension to the standard set of dynamic operands, and evaluates to the integer number of children of the node given by the supplied selector. The child count for a node that has no children is 0.
NodeDepth ::= 'CHILDCOUNT(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
There is currently no way to create an index based upon the child count. Therefore, the child count operand should always be used with other criteria to which indexes do apply.
Full text search score operand
The full-text search score operand evaluates to a DOUBLE value equal to the full-text search score of a node. The full-text search score ranks a selector's nodes by their relevance to the 'fullTextSearchExpression' specified in a [FullTextSearch|#Fulltextsearchconstraint. The magnitude of the scores are implementation specific, but most implementations will produce higher scores with more relevant matching and lower scores for less-relevant matching.
FullTextSearchScore ::= 'SCORE(' [selectorName] ')'
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
See also the 'jcr:score' pseudo-column, which enables accessing the score of any node as if the score were a regular property.
Lowercase operand
The lowercase operand evaluates to the the lower-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.
LowerCase ::= 'LOWER(' DynamicOperand ')'
Uppercase operand
The uppercase operand evaluates to the the upper-case string value (or values, if multi-valued) of operand. If the operand does not evaluate to a string value, its value is first converted to a string.
UpperCase ::= 'UPPER(' DynamicOperand ')'
Arithmetic operand
The arithmetic operand is a ModeShape-specific extension to the standard JCR-SQL2 grammar. It allows two other dynamic operands that evaluate to numeric values to be numerically combined using addition, subtraction, multiplication, or division.
Arithmetic ::= DynamicOperand ('+'|'-'|'*'|'/') DynamicOperand
For example, the following query restricts the results such that the sum of the score of nodes originating from separate selectors is greater than 1.0:
SELECT * FROM [acme:type1] AS type1
JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2
WHERE SCORE(type1) + SCORE(type2) > 1.0
So although it's possible to use in the WHERE clause, it's more likely to be used in the order-by clauses. For example, the following query orders the results based upon the difference in the scores of nodes in the two selectors:
SELECT * FROM [acme:type1] AS type1
JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2
ORDER BY ( SCORE(type1) - SCORE(type2) ) ASC,
LENGTH(type2.prop3) DESC
Cast operand
Available starting with ModeShape 4.5.0.Final
The cast operand is a ModeShape-specific extension to the standard JCR-SQL2 grammar. It allows casting a PropertyValue to a specific type, before applying the query constraint.
Cast ::= 'CAST(' PropertyValue ' AS ' PropertyType ')'
For example, the following query forces a string property to be evaluated as a double when performing the "less than equal" constraint:
SELECT * FROM [node] as x where CAST(x.stringProperty AS DOUBLE) <= 20.50
Of course, in case the value of a property cannot be cast to the specified type, a ValueFormatException will be thrown at runtime, when retrieving the results of the query execution.
Ordering
The "ORDER BY" clause defined by the standard JCR-SQL2 grammar allows the order of the results to be dictated by the values evaluated at execution time based upon one or more dynamic operands. The rule for the expression is as follows:
orderings ::= Ordering {',' Ordering}
Ordering ::= DynamicOperand [Order] [NullOrder]
Order ::= 'ASC' | 'DESC'
NullOrder ::= 'NULLS FIRST' | 'NULLS LAST'
As with SQL, the "ASC" qualifier specifies that the ordering should be in ascending order, and is the default; likewise, the "DESC" qualifier specifies that the ordering should be in descending order.
The "NULLS FIRST" and "NULLS LAST" qualifiers specify whether rows with null values in the columns specified by the dynamic operand
should appear first (before rows with non-null values) or last (after rows with non-null values). If the null ordering is not specified then the handling of the null values is:
Columns
The standard JCR-SQL2 grammar allows a query to include in the "SELECT" clause which property values should be returned and included in the results:
columns ::= (Column ',' {Column}) | '*'
Column ::= ([selectorName'.']propertyName ['AS' columnName]) |
(selectorName'.*')
/* If only one selector exists in this query,
explicit specification of the selectorName
is optional */
selectorName ::= Name
propertyName ::= Name
columnName ::= Name
When "*" is used for the list of selected columns, the result set is expected to minimally include, for each selector, a column for each single-valued non-residual property of the selector's node type, including those explicitly declared on the node type and those inherited from the node's supertypes.
For example, the result set for the following query would contain at least the '[jcr:primaryType]' column, since it is the only single-valued, non-residual property defined on the '[nt:base]' node type. The '[jcr:mixinTypes]' property is also non-residual, but the results need not include it since it's multi-valued.
If there are multiple selectors, then "SELECT *" will include all of the selectable columns from each selector's node type. However, it's possible to request all of the selectable columns from some of the selectors, using the form. For example:
SELECT type1.*
FROM [acme:type1] AS type1
JOIN [acme:type2] as type2 ON type1.prop1 < type2.prop2
Note, however, that although only single-valued, non-residual properties are included when "*" is used in the SELECT clause, it is possible to explicitly include residual properties. For example, the following query finds all nodes that have at least one "foo" value for the 'acme:tagNames' property:
SELECT [acme:tagNames] AS tagName
FROM [nt:base] WHERE tagName = 'foo'
Limit and offset
Neither the standard JCR-SQL2 grammar or the JCR API itself provide support for limiting the rows that are returned in the results. This is a common need, especially for applications that paginate the results, where each page shows a subset of the results.
Because this is such an essential feature that can't be accomplished any other way, ModeShape adds support for specifying the maximum number of rows to return, and optionally specifying the number of initial rows that should be skipped. The ModeShape extension is follows the SQL syntax:
Limit ::= 'LIMIT' count [ 'OFFSET' offset ]
count ::= /* Positive integer value */
offset ::= /* Non-negative integer value */
The LIMIT clause is entirely optional, and if absent does not limit the result set rows in any way. However, if the "LIMIT count" clause is used, then the result set will contain no more than count rows. This LIMIT clause may optionally be followed by the "OFFSET number" clause, where number is the number of initial rows that should be skipped before the rows are included in the results.
Pseudo-columns
The design of the JCR-SQL2 query language makes fairly heavy use of functions, including SCORE(), NAME(), LOCALNAME(), and various constraints. ModeShape adds several more useful functions, such as PATH() and DEPTH(), that follow the same patterns.
However, these functions have several disadvantages. First, they make the JCR-SQL2 language less "SQL-like", since SQL-92 and -99 don't define similar kinds of functions. (There are aggregate functions, like COUNT, SUM, etc., but they operate on a particular column in all tuples and are therefore more dissimilar than similar.) This means that applications that use SQL and SQL-like query languages are less likely to be able to build and issue JCR-SQL2 queries.
A second disadvantage of these functions is that JCR-SQL2 does not allow them to be used within the SELECT clause. As a result, the location-related and score information cannot be included as columns of values in the QueryResult rows. Instead, a client can only access this information by obtaining the Node object(s) for each row. Relying upon both the result set and additional Java objects makes it difficult to use the JCR query system. It also makes certain kinds of applications impossible.
For example, ModeShape's JDBC driver is designed to enable JDBC-aware applications to query repository content using JCR-SQL2 queries. The standard JDBC API cannot expose the Node objects, so the only way to return the path-related and score information is through additional columns in the result. While such columns could always "magically" appear in the result set, doing this is not compatible with JDBC applications that dynamically build the SELECT clauses of queries based upon database metadata. Such applications require the columns to be properly described in database metadata, and the columns need to be used within queries.
ModeShape attempts to solve these issues by directly supporting a number of "pseudo-columns" within JCR-SQL2 queries, wherever columns can be used. These "pseudo-columns" include:
-
jcr:score is a column of type DOUBLE that represents the full-text search score of the node, which is a measure of the node's relevance to the full-text search expression. ModeShape does compute the scores for all queries, though the score for rows in queries that do not include a full-text search criteria may not be reliable.
-
jcr:path is a column of type PATH that represents the normalized path of a node, including same-name siblings. This is the same as what would be returned by the getPath() method of Node. Examples of paths include "/jcr:system" and "/foo/bar3".
-
jcr:name is a column of type NAME that represents the node name in its namespace-qualified form using namespace prefixes and excluding same-name-sibling indexes. Examples of node names include "jcr:system", "jcr:content", "ex:UserData", and "bar".
-
mode:localName is a column of type STRING that represents the local name of the node, which excludes the namespace prefix and same-name-sibling index. As an example, the local name of the "jcr:system" node is "system", while the local name of the "ex:UserData3" node is "UserData".
-
mode:depth is a column of type LONG that represents the depth of a node, which corresponds exactly to the number of path segments within the path. For example, the depth of the root node is 0, whereas the depth of the "/jcr:system/jcr:nodeTypes" node is 2.
-
mode:id is a column of type STRING that represents the identifier of a node as returned by javax.jcr.Node.getIdentifier().
All of these pseudo-columns can be used in the SELECT clause of any JCR-SQL2 query, and their use defines whether such columns appear in the result set. In fact, all of these pseudo-columns will be included when "SELECT *" clauses in JCR-SQL2 queries are expanded by the query engine. This means that every node type (even mixin node types that have no properties and are essentially markers) are represented by a queryable table with at least one column. However, unlike the older JCR-SQL query language, these pseudo-columns are never included in the result unless explicitly included or implicitly included with the "SELECT *" clause.
Why did ModeShape use the "jcr" namespace prefix for some of the pseudo-columns, and "mode" for the others? The older JCR-SQL language defined the "jcr:score", "jcr:path", and "jcr:name" pseudo-columns, so we just use the same names. The other columns were unique to ModeShape and are therefore defined with the "mode" namespace prefix.
Like any other column, all of these pseudo-columns can be also be used in the WHERE clause of any JCR-SQL2 query, even if they are not included in the SELECT clause. They can be used anywhere that a regular column can be used, including within constraints and dynamic operands. ModeShape will automatically rewrite queries that use pseudo-columns in the dynamic operands of constraints to use the corresponding function, such as SCORE(), PATH(), NAME(), LOCALNAME(), and DEPTH(). Additionally, any property existence constraint using these pseudo-columns will always evaluate to 'true' (and thus ModeShape's query optimizer will always remove such constraints from the query plan).
The "jcr:path" pseudo-column may also be used on both sides of an equijoin constraint clause. For example, equijoin expressions similar to:
... selector1.[jcr:path] = selector2.[jcr:path] ...
will be automatically rewritten by ModeShape's optimizer to the following form:
... ISSAMENODE(selector1,selector2) ...
As with regular columns, the pseudo-columns must be qualified with the selector name if the query contains more than one selector.
Full-text search grammar
The grammar for the full-text search expressions used in the JCR-SQL2's full-text search constraint is as follows:
FulltextSearch ::= Disjunct {Space 'OR' Space Disjunct}
Disjunct ::= Term {Space Term}
Term ::= ['-'] SimpleTerm
SimpleTerm ::= Word | '"' Word {Space Word} '"'
Word ::= NonSpaceChar {NonSpaceChar}
Space ::= SpaceChar {SpaceChar}
NonSpaceChar ::= Char - SpaceChar /* Any Char except SpaceChar */
SpaceChar ::= ' '
Char ::= /* Any character */
This grammar supports expressions similar to what you might provide to an internet search engine. Essentially, it simply lists the terms or phrases that should appear (or not appear) in the applicable property value(s). Simple terms consist of a single word (with only non-space characters), while phrases can simply be surrounded with double quotes.
Example JCR-SQL2 queries
This section walks through several JCR-SQL2 example queries, describing what each one does and, in some cases, providing alternative queries.
The basics
One of the simplest JCR-SQL2 queries finds all nodes in the current workspace of the repository:
This query will return a result set containing the "jcr:primaryType" column, since the nt:base node type defines only one single-valued, non-residual property called "jcr:primaryType".
ModeShape does not currently support returning multi-valued properties in result sets. This is permitted by the JCR 2.0 specification. ModeShape does, however, support using multi-valued properties in constraints and ORDER BY clauses.
Since our query used "SELECT *", ModeShape also includes the five non-standard pseudo-columns mentioned above: "jcr:path", "jcr:score", "jcr:name", "mode:localName", and "mode:depth". These columns are very convenient to have in the results, but also make certain criteria much easier than with the corresponding standard or ModeShape-specific functions.
Queries can explicitly specify the columns that are to be returned in the results. The following query is very similar to the previous query and will return the same rows, but the result set will have only a single column and will not include any of the pseudo-columns:
SELECT [jcr:primaryType] FROM [nt:base]
The following query will return the same rows as in the previous two queries, but the SELECT clause explicitly includes only two of the pseudo-columns for the path and depth (which are computed from the nodes' locations):
SELECT [jcr:primaryType], [jcr:path], [mode:depth] FROM [nt:base]
In JCR-SQL2, a table representing a particular node type will have a column for each of the node type's property definitions, including those inherited from supertypes. For example, the nt:file node type, its nt:hierarchyNode supertype, and the mix:created mixin type are defined using the CND notation as follows:
[mix:created] mixin
- jcr:created (date) protected
- jcr:createdBy (string) protected
[nt:hierarchyNode] > mix:created abstract
[nt:file] > nt:hierarchyNode
+ jcr:content (nt:base) primary mandatory
Therefore, the table representing the nt:file node type will have 3 columns: the "jcr:created" and "jcr:createdBy" columns inherited from the mix:created mixin node type (via the nt:hierarchyNode node type), and the "jcr:primaryType" column inherited from the nt:base node type, which is the implicit supertype of the nt:hierarchyNode (and all node types).
ModeShape adheres to this behavior with the exception that a "SELECT *" will result in the additional pseudo-columns. Thus, this next query:
is equivalent to this query:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy],
[jcr:path], [jcr:name], [jcr:score], [mode:localName], [mode:depth]
FROM [nt:file]
Using columns in constraints
Next, let's look at a query that selects some of the available columns from the nt:file table and uses a constraint to ensure the resulting file nodes have names that end in '.txt':
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file]
WHERE LOCALNAME() LIKE '%.txt'
ModeShape also supports placing criteria against the mode:localName pseudo-column instead of using the LOCALNAME() function. Such a query is equivalent to the previous query and will produce the exact same results:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path]
FROM [nt:file]
WHERE [mode:localName] LIKE '%.txt'
ModeShape's pseudo-columns are often far easier to use than the corresponding function-like constraints.
Although this query looks much more like SQL, the use of the '' and '' characters to quote the identifiers is not typical of a SQL dialect. ModeShape actually supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). Again, this next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous two queries:
SELECT "jcr:primaryType", "jcr:created", "jcr:createdBy", "jcr:path" FROM "nt:file"
WHERE "mode:localName" LIKE '%.txt'
Inner joins
In JCR-SQL2, a node will appear as a row in each table that corresponds to the node types defined by that node's primary type or mixin types, or any supertypes of these node types. In other words, a node will appear in the table corresponding to each node type for which Node.isNodeType(...) returns true.
For example, consider a node that has a primary type of nt:file but has an explicit mixin of mix:referenceable. This node will appear as a row in the all of these tables:
-
nt:file
-
mix:referenceable
-
nt:hierarchyNode
-
mix:created
-
nt:base
However, the columns in each of these tables will differ. The nt:file node type has the nt:hierarchyNode, mix:created, and nt:base for supertypes, and therefore the table for nt:file contains columns for the property definitions on all of these types. But because mix:referenceable is not a supertype of nt:file, the table for nt:file will not contain a jcr:uuid column. To obtain a single result set that contains columns for all the properties of our node, we need to perform an identity join.
The next query shows how to return all properties for nt:file nodes that are also mix:referenceable:
SELECT file.*, ref.*
FROM [nt:file] AS file
JOIN [mix:referenceable] AS ref
ON ISSAMENODE(file,ref)
Since wildcards were used in the SELECT clause, ModeShape expands the SELECT clause to include the columns for all (explicit and inherited) property definitions of each type plus pseudo-columns for each type, which is equivalent to:
SELECT file.[jcr:primaryType],
file.[jcr:created],
file.[jcr:createdBy],
file.[jcr:path],
file.[jcr:name],
file.[jcr:score],
file.[mode:localName],
file.[mode:depth],
ref.[jcr:path],
ref.[jcr:name],
ref.[jcr:score],
ref.[mode:localName],
ref.[mode:depth],
ref.[jcr:uuid]
FROM [nt:file] AS file
JOIN [mix:referenceable] AS ref
ON ISSAMENODE(file,ref)
Note because we are using an identity join, the "file.[jcr:path]" column will contain the same value as the "ref.[jcr:path]".
Fully-expand the SELECT clause to specify exactly the columns that you want, excluding the columns that return the same values or return values not needed by your application. This can also make the query a bit more efficient, since less data needs to be found and returned.
By the way, this is also what many well-written applications do when querying SQL databases.
Here is a query that does this by eliminating columns with duplicate values and using aliases that are simpler than the namespace-qualified names:
SELECT file.[jcr:primaryType] AS primaryType,
file.[jcr:created] AS created,
file.[jcr:createdBy] AS createdBy,
ref.[jcr:uuid] AS uuid,
file.[jcr:path] AS path,
file.[jcr:name] AS name,
file.[jcr:score] AS score,
file.[mode:localName] AS localName,
file.[mode:depth] AS depth
FROM [nt:file] AS file
JOIN [mix:referenceable] AS ref
ON ISSAMENODE(file,ref)
Although this query looks much more like SQL, use of the '[' and ']' characters in JCR-SQL2 to quote the identifiers is not typical of a SQL dialect. Again, ModeShape supports the using double-quote characters and square braces interchangeably around identifiers (although they must match around any single identifier). This makes it easier for existing SQL-oriented tools and applications to work more readily with ModeShape, including applications that use ModeShape's JDBC driver to query a ModeShape JCR repository.
This next query, which looks remarkably like any SQL-92 or -99 dialect, is functionally identical to the previous query. However, it uses double quotes and a pseudo-column identity constraint on "jcr:path" (which is identical in semantics and performance as the "ISSAMENODE(...)" constraint):
SELECT file."jcr:primaryType" AS primaryType,
file."jcr:created" AS created,
file."jcr:createdBy" AS createdBy,
ref."jcr:uuid" AS uuid,
file."jcr:path" AS path,
file."jcr:name" AS name,
file."jcr:score" AS score,
file."mode:localName" AS localName,
file."mode:depth" AS depth
FROM "nt:file" AS file
JOIN "mix:referenceable" AS ref
ON file."jcr:path" = ref."jcr:path"
When using joins and selecting multiple columns, use aliases on the columns to make it easier to reference those columns in constraints and ordering clauses.
Other joins
These are examples of two-way inner joins, but ModeShape supports joining multiple tables together in a single query. ModeShape also supports a variety of joins, including:
Set operations: unions, intersections, and complements
ModeShape also supports several other query features beyond JCR-SQL2. One of these is support for set queries that use:
Here is an example of a union:
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:file]
UNION
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path] FROM [nt:folder]
Subqueries
ModeShape also supports using (non-correlated) subqueries within the WHERE clause and wherever a static operand can be used. Subqueries can even be used within another subquery. All subqueries, though, should return a single column (all other columns will be ignored), and each row's single value will be treated as a literal value. If the subquery is used in a clause that expects a single row (e.g., in a comparison), only the subquery's first row will be used.
Subqueries in ModeShape are a powerful and easy way to use more complex criteria that is a function of the content in the repository, without having to resort to multiple queries and complex application logic, such as taking the results of one query and dynamically generating the criteria of another query.
Here's an example of a query that finds all nt:file nodes in the repository whose paths are referenced in the value of the vdb:originalFile property of the vdb:virtualDatabase nodes. (This query also uses the "$maxVersion" variable in the subquery.)
SELECT [jcr:primaryType], [jcr:created], [jcr:createdBy], [jcr:path]
FROM [nt:file]
WHERE PATH() IN (
SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
WHERE [vdb:version] <= $maxVersion
AND CONTAINS([vdb:description],'xml OR xml maybe')
)
Without subqueries, this query would need to be broken into two separate queries: the first would find all of the paths referenced by the vdb:virtualDatabase nodes matching the version and description criteria, followed by one (or more) subsequent queries to find the nt:file nodes with the paths expressed as literal values (or variables).
Using a subquery is not only easier to implement and understand, it is actually more efficient.