JBoss.orgCommunity Documentation

Chapter 27. JCR Query Usecases

27.1. Intro
27.2. Query Lifecycle
27.2.1. Query Creation and Execution
27.2.2. Query Result Processing
27.2.3. Scoring
27.3. Query result settings
27.4. Type Constraints
27.5. Property Constraints
27.6. Path Constraint
27.7. Ordering specifing
27.8. Fulltext Search
27.9. Indexing rules and additional features
27.10. Query Examples
27.10.1. SetOffset and SetLimit
27.10.2. Finding All Nodes
27.10.3. Finding Nodes by Primary Type
27.10.4. Finding Nodes by Mixin Type
27.10.5. Property Comparison
27.10.6. LIKE Constraint
27.10.7. Escaping in LIKE Statements
27.10.8. NOT Constraint
27.10.9. AND Constraint
27.10.10. OR Constraint
27.10.11. Property Existence Constraint
27.10.12. Finding Nodes in a Case-Insensitive Way
27.10.13. Date Property Comparison
27.10.14. Node Name Constraint
27.10.15. Multivalue Property Comparison
27.10.16. Exact Path Constraint
27.10.17. Child Node Constraint
27.10.18. Finding All Descendant Nodes
27.10.19. Sorting Nodes by Property
27.10.20. Ordering by Descendant Nodes Property (XPath only)
27.10.21. Ordering by Score
27.10.22. Ordering by Path or Name
27.10.23. Fulltext Search by Property
27.10.24. Fulltext Search by All Properties in Node
27.10.25. Ignoring Accent Symbols. New Analyzer Setting.
27.10.26. Finding nt:file node by content of child jcr:content node
27.10.27. Changing Priority of Node
27.10.28. Removing Nodes Property From Indexing Scope
27.10.29. Regular Expression as Property Name in Indexing Rules
27.10.30. High-lighting Result of Fulltext Search
27.10.31. Searching By Synonim
27.10.32. Checking the spelling of Phrase
27.10.33. Finding Similar Nodes
27.11. Tips and tricks
27.11.1. XPath queries containing node names starting with a number

JCR supports two query languages - JCR and XPath. A query, whether XPath or SQL, specifies a subset of nodes within a workspace, called the result set. The result set constitutes all the nodes in the workspace that meet the constraints stated in the query.

Find all nodes in the repository. Only those nodes are found to which the session has READ permission. See also Access Control.

Find all nodes in repository, that contain a mixin type "mix:title".

Find all nodes with mixin type 'mix:title' where the prop_pagecount property contains a value less than 90. Only select the title of each node.

Find all nodes with mixin type 'mix:title' and where the property 'jcr:title' starts with 'P'.

Find all nodes with a mixin type 'mix:title' and whose property 'jcr:title' starts with 'P%ri'.

As you see "P%rison break" contains the symbol '%'. This symbol is reserved for LIKE comparisons. So what can we do?

Within the LIKE pattern, literal instances of percent ("%") or underscore ("_") must be escaped. The SQL ESCAPE clause allows the definition of an arbitrary escape character within the context of a single LIKE statement. The following example defines the backslash ' \' as escape character:

SELECT * FROM mytype WHERE a LIKE 'foo\%' ESCAPE '\'

XPath does not have any specification for defining escape symbols, so we must use the default escape character (' \').

Find all nodes with a mixin type 'mix:title' and where the property 'jcr:title' does NOT start with a 'P' symbol

Find all fairytales with a page count more than 90 pages.

How does it sound in jcr terms - Find all nodes with mixin type 'mix:title' where the property 'jcr:description' equals "fairytale" and whose "prop_pagecount" property value is less than 90.

Find all documents whose title is 'Cinderella' or whose description is 'novel'.

How does it sound in jcr terms? - Find all nodes with a mixin type 'mix:title' whose property 'jcr:title' equals "Cinderella" or whose "jcr:description" property value is "novel".

Find all nodes with a mixin type 'mix:title' where the property 'jcr:description' does not exist (is null).

Find all nodes with a mixin type 'mix:title' and where the property 'jcr:title' equals 'casesensitive' in lower or upper case.

Find all nodes of primary type "nt:resource" whose jcr:lastModified property value is greater than 2006-06-04 and less than 2008-06-04.

SQL

In SQL you have to use the keyword TIMESTAMP for date comparisons. Otherwise, the date would be interpreted as a string. The date has to be surrounded by single quotes (TIMESTAMP 'datetime') and in the ISO standard format: YYYY-MM-DDThh:mm:ss.sTZD ( http://en.wikipedia.org/wiki/ISO_8601 and well explained in a W3C note http://www.w3.org/TR/NOTE-datetime).

You will see that it can be a date only (YYYY-MM-DD) but also a complete date and time with a timezone designator (TZD).

// make SQL query
QueryManager queryManager = workspace.getQueryManager();
// create query
StringBuffer sb = new StringBuffer();
sb.append("select * from nt:resource where ");
sb.append("( jcr:lastModified >= TIMESTAMP '");
sb.append("2006-06-04T15:34:15.917+02:00");
sb.append("' )");
sb.append(" and ");
sb.append("( jcr:lastModified <= TIMESTAMP '");
sb.append("2008-06-04T15:34:15.917+02:00");
sb.append("' )");
String sqlStatement = sb.toString();
Query query = queryManager.createQuery(sqlStatement, Query.SQL);
// execute query and fetch result
QueryResult result = query.execute();

XPath

Compared to the SQL format, you have to use the keyword xs:dateTime and surround the datetime by extra brackets: xs:dateTime('datetime'). The actual format of the datetime also conforms with the ISO date standard.

// make XPath query
QueryManager queryManager = workspace.getQueryManager();
// create query
StringBuffer sb = new StringBuffer();
sb.append("//element(*,nt:resource)");
sb.append("[");
sb.append("@jcr:lastModified >= xs:dateTime('2006-08-19T10:11:38.281+02:00')");
sb.append(" and ");
sb.append("@jcr:lastModified <= xs:dateTime('2008-06-04T15:34:15.917+02:00')");
sb.append("]");
String xpathStatement = sb.toString();
Query query = queryManager.createQuery(xpathStatement, Query.XPATH);
// execute query and fetch result
QueryResult result = query.execute();

Find all nodes with primary type 'nt:file' whose node name is 'document'. The node name is accessible by a function called "fn:name()".

Find all nodes with the primary type 'nt:unstructured' whose property 'multiprop' contains both values "one" and "two".

Find a node with the primary type 'nt:file' that is located on the exact path "/folder1/folder2/document1".

Find all nodes with the primary type 'nt:folder' that are children of node by path "/root1/root2". Only find children, do not find further descendants.

Find all nodes with the primary type 'nt:folder' that are descendants of the node "/folder1/folder2".

Select all nodes with the mixin type ''mix:title' and order them by the 'prop_pagecount' property.

Select all nodes with the mixin type 'mix:title' containing any word from the set {'brown','fox','jumps'}. Then, sort result by the score in ascending node. This way nodes that match better the query statement are ordered at the last positions in the result list.

Find all nodes containing a mixin type 'mix:title' and whose 'jcr:description' contains "forest" string.

Find nodes with mixin type 'mix:title' where any property contains 'break' string.

In this example, we will create new Analyzer, set it in QueryHandler configuration, and make query to check it.

Standard analyzer does not normalize accents like é,è,à. So, a word like 'tréma' will be stored to index as 'tréma'. But if we want to normalize such symbols or not? We want to store 'tréma' word as 'trema'.

There is two ways of setting up new Analyzer (no matter standarts or our):

There is only one way - create new Analyzer (if there is no previously created and accepted for our needs) and set it in Search index.

  • The second way: Register new Analyzer in QueryHandler configuration (this one eccepted since 1.12 version);

We will use the last one:

  • Create new MyAnalyzer

public class MyAnalyzer extends Analyzer
{
   @Override
   public TokenStream tokenStream(String fieldName, Reader reader)
   {
      StandardTokenizer tokenStream = new StandardTokenizer(reader);
      // process all text with standard filter
      // removes 's (as 's in "Peter's") from the end of words and removes dots from acronyms.
      TokenStream result = new StandardFilter(tokenStream);
      // this filter normalizes token text to lower case
      result = new LowerCaseFilter(result);
      // this one replaces accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalents
      result = new ISOLatin1AccentFilter(result);
      // and finally return token stream
      return result;
   }
}
  • Then, register new MyAnalyzer in configuration

<workspace name="ws">
   ...
   <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
      <properties>
         <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
         ...
      </properties>
   </query-handler>
   ...
</workspace>

After that, check it with query:

Find node with mixin type 'mix:title' where 'jcr:title' contains "tréma" and "naïve" strings.

The node type nt:file represents a file. It requires a single child node, called jcr:content. This node type represents images and other binary content in a JCRWiki entry. The node type of jcr:conent is nt:resource which represents the actual content of a file.

Find node with the primary type is 'nt:file' and which whose 'jcr:content' child node contains "cats".

Normally, we can't find nodes (in our case) using just JCR SQL or XPath queries. But we can configure indexing so that nt:file aggregates jcr:content child node.

So, change indexing-configuration.xml:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.2.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <aggregate primaryType="nt:file">
        <include>jcr:content</include>
        <include>jcr:content/*</include>
        <include-property>jcr:content/jcr:lastModified</include-property>
    </aggregate>
</configuration>

Now the content of 'nt:file' and 'jcr:content' ('nt:resource') nodes are concatenated in a single Lucene document. Then, we can make a fulltext search query by content of 'nt:file'; this search includes the content of child 'jcr:content' node.

In this example, we will set different boost values for predefined nodes, and will check effect by selecting those nodes and order them by jcr:score.

The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.

In this example, we will exclude some 'text' property of nt:unstructured node from indexind. And, therefore, node will not be found by the content of this property, even if it accepts all constraints.

First of all, add rules to indexing-configuration.xml:

<index-rule nodeType="nt:unstructured" condition="@rule='nsiTrue'">
    <!-- default value for nodeScopeIndex is true -->
    <property>text</property>
</index-rule>

<index-rule nodeType="nt:unstructured" condition="@rule='nsiFalse'">
    <!-- do not include text in node scope index -->
    <property nodeScopeIndex="false">text</property>
</index-rule>

In this example, we want to configure indexind in the next way. All properties of nt:unstructured nodes must be excluded from search, except properties whoes names ends with 'Text' string. First of all, add rules to indexing-configuration.xml:

<index-rule nodeType="nt:unstructured"">
   <property isRegexp="true">.*Text</property>
</index-rule>

Now, let's check this rule with simple query - select all nodes with primary type 'nt:unstructured' and containing 'quick' string (fulltext search by full node).

It's also called excerption (see Excerpt configuration in Search Configuration and in Searching Repository article).

The goal of this query is to find words "eXo" and "implementation" with fulltext search and high-light this words in result value.

Find all mix:title nodes where title contains synonims to 'fast' word.

Synonim provider must be configured in indexing-configuration.xml :

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      ...
      <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" />
      <property name="synonymprovider-config-path" value="../../synonyms.properties" />
      ...
   </properties>
</query-handler>

File synonim.properties contains next synonims list:

ASF=Apache Software Foundation
quick=fast
sluggish=lazy

Check the correct spelling of phrase 'quik OR (-foo bar)' according to data already stored in index.

SpellChecker must be settled in query-handler config.

test-jcr-config.xml:

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      ...
   <property name="spellchecker-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$FiveSecondsRefreshInterval" />
      ...
   </properties>
</query-handler>

Find similar nodes to node by path '/baseFile/jcr:content'.

In our example, baseFile will contain text where "terms" word happens many times. That's a reason why the existanse of this word will be used as a criteria of node similarity (for node baseFile).

Higlighting support must be added to configuration. test-jcr-config.xml:

<query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
   <properties>
      ...
      <property name="support-highlighting" value="true" />
      ...
   </properties>
</query-handler>

If you execute an XPath request like this:

XPath

// get QueryManager
QueryManager queryManager = workspace.getQueryManager(); 
// make XPath query
Query query = queryManager.createQuery("/jcr:root/Documents/Publie/2010//element(*, exo:article)", Query.XPATH);

You will have an error : "Invalid request". This happens because XML does not allow names starting with a number - and XPath is part of XML: http://www.w3.org/TR/REC-xml/#NT-Name

Therefore, you cannot do XPath requests using a node name that starts with a number.

Easy workarounds:

  • Use an SQL request.

  • Use escaping :

XPath

// get QueryManager
QueryManager queryManager = workspace.getQueryManager(); 
// make XPath query
Query query = queryManager.createQuery("/jcr:root/Documents/Publie/_x0032_010//element(*, exo:article)", Query.XPATH);