JBoss.orgCommunity Documentation

Chapter 10. Querying and Searching using JCR

10.1. JCR Query API
10.2. JCR XPath Query Language
10.2.1. Column Specifiers
10.2.2. Type Constraints
10.2.3. Property Constraints
10.2.4. Path Constraints
10.2.5. Ordering Specifiers
10.2.6. Miscellaneous
10.3. JCR-SQL Query Language
10.4. JCR-SQL2 Query Language
10.4.1. Queries
10.4.2. Sources
10.4.3. Joins
10.4.4. Equi-Join Conditions
10.4.5. Same-Node Join Conditions
10.4.6. Child-Node Join Conditions
10.4.7. Descendant-Node Join Conditions
10.4.8. Constraints
10.4.9. And Constraints
10.4.10. Or Constraints
10.4.11. Not Constraints
10.4.12. Comparison Constraints
10.4.13. Between Constraints
10.4.14. Property Existence Constraints
10.4.15. Set Constraints
10.4.16. Full-text Search Constraints
10.4.17. Same-Node Constraint
10.4.18. Child-Node Constraints
10.4.19. Descendant-Node Constraints
10.4.20. Paths and Names
10.4.21. Static Operands
10.4.22. Bind Variables
10.4.23. Dynamic Operands
10.4.24. Ordering
10.4.25. Columns
10.4.26. Limit and Offset
10.5. Full-Text Search Language
10.5.1. Full-text Search Expressions

The JCR API defines a way to query a repository for content that meets user-defined criteria. The JCR API actually makes it possible for implementations to support multiple query languages, but the only language required by JCR 1.0 is a subset of XPath. The 1.0 specification also defines a SQL-like query language, but supporting it is optional.

ModeShape now supports this query feature, including the required XPath language and two other languages not defined in the JCR 1.0 specification. This chapter describes how your applications can use queries to search your repositories, and defines the three query languages that are available with ModeShape.

Like most operations in the JCR API, querying is done through a Session instance, from which can be obtained the QueryManager that defines methods for creating Query objects, storing queries as Nodes in the repository, and reconstituting queries that were stored on Nodes. Thus, querying a repository generally follows this pattern:



// Obtain the query manager for the session ...
javax.jcr.query.QueryManager queryManager = session.getWorkspace().getQueryManager();
// Create a query object ...
String language = ...
String expression = ...
javax.jcr.Query query = queryManager.createQuery(expression,language);
// Execute the query and get the results ...
javax.jcr.QueryResult result = query.execute();
// Iterate over the nodes in the results ...
javax.jcr.NodeIterator nodeIter = result.getNodes();
while ( nodeIter.hasNext() ) {
    javax.jcr.Node node = nodeIter.nextNode();
        ...
}
// Or iterate over the rows in the results ...
String[] columnNames = result.getColumnNames();
javax.jcr.query.RowIterator rowIter = result.getRows();
while ( rowIter.hasNext() ) {
    javax.jcr.query.Row row = rowIter.nextRow();
    // Iterate over the column values in each row ...
    javax.jcr.Value[] values = row.getValues();
    for ( javax.jcr.Value value : values ) {
                ...
    }
    // Or access the column values by name ...
    for ( String columnName : columnNames ) {
        javax.jcr.Value value = row.getValue(columnName);
                ...
    }
}
// When finished, close the session ...
session.logout();

For more detail about these methods or about how to use other facets of the JCR query API, please consult Section 6.7 of the JCR 1.0 specification.

The JCR 1.0 specification uses the XPath query language because node structures in JCR are very analogous to the structure of an XML document. Thus, XPath provides a useful language for selecting and searching workspace content. And since JCR 1.0 defines a mapping between XML and a workspace view called the "document view", adapting XPath to workspace content is quite natural.

A JCR XPath query specifies the subset of nodes in a workspace that satisfy the constraints defined in the query. Constraints can limit the nodes in the results to be those nodes with a specific (primary or mixin) node type, with properties having particular values, or to be within a specific subtree of the workspace. The query also defines how the nodes are to be returned in the result sets using column specifiers and ordering specifiers.

Note

As an aside, ModeShape actually implements XPath queries by transforming them into the equivalent JCR-SQL2 representation. And the JCR-SQL2 language, although often more verbose, is much more capable of representing complex queries with multiple combinations of type, property, and path constraints.

JCR 1.0 specifies that support is required only for specifying constraints of one primary type, and it is optional to support specifying constraints on one (or more) mixin types. The specification also defines that the XPath element test be used to test against node types, and that it is optional to support element tests on location steps other than the last one. Type constraints are inherently inheritance-sensitive, in that a constraint against a particular node type 'X' will be satisfied by nodes explicitly declared to be of type 'X' or of subtypes of 'X'.

ModeShape does support using the element test to test against primary or mixin type. ModeShape also only supports using an element test on the last location step. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.


Note that the JCR-SQL2 language supported by ModeShape is far more capable of joining multiple sets of nodes with different type, property and path constraints.

JCR 1.0 specifies that attribute tests on the last location step is required, but that predicate tests on any other location steps are optional.

ModeShape does support using attribute tests on the last location step to specify property constraints, as well as supporting axis and filter predicates on other location steps. For example, the following table shows several XPath queries and how they map to JCR-SQL2 queries.


Section 6.6.3.3 of the JCR 1.0 specification contains an in-depth description of property value constraints using various comparison operators.

The JCR-SQL query language is defined by the JCR 1.0 specification as a way to express queries using strings that are similar to SQL. Support for the language is optional, and in fact this language was deprecated in the JCR 2.0 specification in favor of the improved and more powerful (and more SQL-like) JCR-SQL2 language, which is covered in the next section. As such, ModeShape does not support the original JCR-SQL language.

The JCR-SQL2 query language is defined by the JCR 2.0 specification as a way to express queries using strings that are similar to SQL. This query language is an improvement over the earlier JCR-SQL language, which has been deprecated in JCR 2.0 (see previous section).

ModeShape includes full support for the complete JCR-SQL2 query language. However, ModeShape adds several extensions to make it even more powerful:

  • Support for the "FULL OUTER JOIN" and "CROSS JOIN" join types, in addition to the "LEFT OUTER JOIN", "RIGHT OUTER JOIN" and "INNER JOIN" types defined by JCR-SQL2. Note that "JOIN" is a shorthand for "INNER JOIN". For detail, see the grammar for joins.

  • Support for the UNION, INTERSECT, and EXCEPT set operations on multiple result sets to form a single result set. As with standard SQL, the result sets being combined must have the same columns. The UNION operator combines the rows from two result sets, the INTERSECT operator returns the difference between two result sets, and the EXCEPT operator returns the rows that are common to two result sets. Duplicate rows are removed unless the operator is followed by the ALL keyword. For detail, see the grammar for set queries.

  • Removal of duplicate rows in the results, using "SELECT DISTINCT ...". For detail, see the grammar for queries.

  • Limiting the number of rows in the result set with the "LIMIT count" clause, where count is the maximum number of rows that should be returned. This clause may optionally be followed by the "OFFSET number" clause to specify the number of initial rows that should be skipped. For detail, see the grammar for limits and offsets.

  • Additional dynamic operands "DEPTH([<selectorName>])" and "PATH([<selectorName>])" that enable placing constraints on the node depth and path, respectively. These dynamic operands can be used in a manner similar to "NAME([<selectorName>])" and "LOCALNAME([<selectorName>])" that are defined by JCR-SQL2. Note in each of these cases, the selector name is optional if there is only one selector in the query. For detail, see the grammar for dynamic operands.

  • Additional dynamic operand "REFERENCE([<selectorName>.]<propertyName>)" and "REFERENCE([<selectorName>])" that enables placing constraints on one or any of the reference properties, respectively, and which can be used in a manner similar to " PropertyValue([<selectorName>.]<propertyName>)". Note in each of these cases, the selector name is optional if there is only one selector in the query, and that the property name can be excluded if the constraint should apply to all reference properties. For detail, see the grammar for dynamic operands.

  • Support for the IN and NOT IN clauses to more easily and concisely supply multiple of discrete static operands. For example, "WHERE ... [my:type].[prop1] IN (3,5,7,10,11,50) ...". For detail, see the grammar for set constraints.

  • Support for the BETWEEN clause to more easily and concisely supply a range of discrete operands. For example, "WHERE ... [my:type].[prop1] BETWEEN 3 EXCLUSIVE AND 10 ...". For detail, see the grammar for between constraints.

  • Support for simple arithmetic in numeric-based criteria and order-by clauses. For example, "... WHERE SCORE(type1) + SCORE(type2) > 1.0" or "... ORDER BY (SCORE(type1) * SCORE(type2)) ASC, LENGTH(type2.property1) DESC". For detail, see the grammar for order-by clauses.

The grammar for the JCR-SQL2 query language is actually a superset of that defined by the JCR 2.0 specification, and as such the complete grammar is included here.

Note

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values. Literals (or keywords) are denoted by single-quotes.

	
FullTextSearch ::= 'CONTAINS(' ([selectorName'.']propertyName | selectorName'.*') 
                           ',' ''' fullTextSearchExpression''' ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional */

fullTextSearchExpression ::= FulltextSearch

where FulltextSearch is defined by the following, and is the same as the full-text search language supported by ModeShape:


FulltextSearch ::= Disjunct {Space 'OR' Space Disjunct}

Disjunct ::= Term {Space Term}

Term ::= ['-'] SimpleTerm

SimpleTerm ::= Word | '"' Word {Space Word} '"'

Word ::= NonSpaceChar {NonSpaceChar}

Space ::= SpaceChar {SpaceChar}

NonSpaceChar ::= Char - SpaceChar /* Any Char except SpaceChar */

SpaceChar ::= ' '

Char ::= /* Any character */

	
DynamicOperand ::= PropertyValue | ReferenceValue | Length | NodeName | NodeLocalName | NodePath | NodeDepth | 
                   FullTextSearchScore | LowerCase | UpperCase | Arithmetic | 
                   '(' DynamicOperand ')'

PropertyValue ::= [selectorName'.'] propertyName
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional */

ReferenceValue ::= 'REFERENCE(' selectorName '.' propertyName ')' |
                   'REFERENCE(' selectorName ')' |
                   'REFERENCE()' |
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      preceding the propertyName is optional. Also, the property name may be excluded 
                      if the constraint should apply to any reference property. *&#47;

Length ::= 'LENGTH(' PropertyValue ')'

NodeName ::= 'NAME(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodeLocalName ::= 'LOCALNAME(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodePath ::= 'PATH(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

NodeDepth ::= 'DEPTH(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

FullTextSearchScore ::= 'SCORE(' [selectorName] ')'
                   /* If only one selector exists in this query, explicit specification of the selectorName
                      is optional */

LowerCase ::= 'LOWER(' DynamicOperand ')'

UpperCase ::= 'UPPER(' DynamicOperand ')'

Arithmetic ::= DynamicOperand ('+'|'-'|'*'|'/') DynamicOperand

There are times when a formal structured query language is overkill, and the easiest way to find the right content is to perform a search, like you would with a search engine such as Google or Yahoo! This is where ModeShape's full-text search language comes in, because it allows you to use the JCR query API but with a far simpler, Google-style search grammar.

This query language is actually defined by the JCR 2.0 specification as the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language. We just pulled it out and made it available as a first-class query language.

This language allows a JCR client to construct a query to find nodes with property values that match the supplied terms. Nodes that "best" match the terms are returned before nodes that have a lesser match. Of course, ModeShape uses a complex system to analyze the node content and the query terms, and may perform a number of optimizations, such as (but not limited to) eliminating stop words (e.g., "the", "a", "and", etc.), treating terms independent of case, and converting words to base forms using a process called stemming (e.g., "running" into "run", "customers" into "customer").

Search terms can also include phrases by simply wrapping the phrase with double-quotes. For example, the search term 'table "customer invoice"' would rank higher those nodes with properties containing the phrase "customer invoice" than nodes with properties containing just "customer" or "invoice".

Term in the query are implicitly AND-ed together, meaning that the matches occur when a node has property values that match all of the terms. However, it is also possible to put an "OR" in between two terms where either of those terms may occur.

It is also possible to specify that terms should not appear in the results. This is called a negative term, and it reduces the rank of any node whose property values contain the the value. To specify a negative term, simply prefix the term with a hyphen ('-').

The grammar for this full-text search language is specified in Section 6.7.19 of the JCR 2.0 specification, but it is also included here as a convenience.

Note

The grammar is presented using the same EBNF nomenclature as used in the JCR 2.0 specification. Terms are surrounded by '[' and ']' denote optional terms that appear zero or one times. Terms surrounded by '{' and '}' denote terms that appear zero or more times. Parentheses are used to identify groups, and are often used to surround possible values.