Chapter 31. Searching Repository Content

You can find the JCR configuration file here: .../portal/WEB-INF/conf/jcr/repository-configuration.xml. Please read also Search Configuration for more information about index configuration.

31.2. Bi-directional RangeIterator (since 1.9)

QueryResult.getNodes() will return bi-directional NodeIterator implementation.

Note

Bi-directional NodeIterator is not supported in two cases:

SQL query: select * from nt:base
XPath query: //* .

TwoWayRangeIterator interface:

/**
 * Skip a number of elements in the iterator.
 * 
 * @param skipNum the non-negative number of elements to skip
 * @throws java.util.NoSuchElementException if skipped past the first element
 *           in the iterator.
 */
public void skipBack(long skipNum);

Usage:

NodeIterator iter = queryResult.getNodes();
while (iter.hasNext()) {
  if (skipForward) {
    iter.skip(10); // Skip 10 nodes in forward direction
  } else if (skipBack) {
    TwoWayRangeIterator backIter = (TwoWayRangeIterator) iter; 
    backIter.skipBack(10); // Skip 10 nodes back 
  }
  .......
}

31.3. Fuzzy Searches (since 1.0)

JCR supports such features as Lucene Fuzzy Searches Apache Lucene - Query Parser Syntax.

To use it, you have to form a query like the one described below:

QueryManager qman = session.getWorkspace().getQueryManager();
Query q = qman.createQuery("select * from nt:base where contains(field, 'ccccc~')", Query.SQL);
QueryResult res = q.execute();

31.4. SynonymSearch (since 1.9)

Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in Google. If a search term is prefixed by a tilde symbol ( ~ ), also synonyms of the search term are taken into consideration. For example:

SQL: select * from nt:resource where contains(., '~parameter')

XPath: //element(*, nt:resource)[jcr:contains(., '~parameter')

This feature is disabled by default and you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param  name="synonymprovider-config-path" value="..you path to configuration file....."/>
<param  name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"/>

/**
 * <code>SynonymProvider</code> defines an interface for a component that
 * returns synonyms for a given term.
 */
public interface SynonymProvider {

   /**
    * Initializes the synonym provider and passes the file system resource to
    * the synonym provider configuration defined by the configuration value of
    * the <code>synonymProviderConfigPath</code> parameter. The resource may be
    * <code>null</code> if the configuration parameter is not set.
    *
    * @param fsr the file system resource to the synonym provider
    *            configuration.
    * @throws IOException if an error occurs while initializing the synonym
    *                     provider.
    */
   public void initialize(InputStream fsr) throws IOException;

   /**
    * Returns an array of terms that are considered synonyms for the given
    * <code>term</code>.
    *
    * @param term a search term.
    * @return an array of synonyms for the given <code>term</code> or an empty
    *         array if no synonyms are known.
    */
   public String[] getSynonyms(String term);
}

31.5. High-lighting (Since 1.9)

An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.

By default highlighting words matched the query is disabled because this feature requires that additional information is written to the search index. To enable this feature, you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param name="support-highlighting" value="true"/>

Additionally, there is a parameter that controls the format of the excerpt created. In JCR 1.9, the default is set to org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configuration parameter for this setting is:

<param name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.DefaultXMLExcerpt"/>

31.5.1. DefaultXMLExcerpt

This excerpt provider creates an XML fragment of the following form:

<excerpt>
    <fragment>
        <highlight>exoplatform</highlight> implements both the mandatory
        XPath and optional SQL <highlight>query</highlight> syntax.
    </fragment>
    <fragment>
        Before parsing the XPath <highlight>query</highlight> in
        <highlight>exoplatform</highlight>, the statement is surrounded
    </fragment>
</excerpt>

31.5.2. DefaultHTMLExcerpt

This excerpt provider creates an HTML fragment of the following form:

<div>
    <span>
        <strong>exoplatform</strong> implements both the mandatory XPath
        and optional SQL <strong>query</strong> syntax.
    </span>
    <span>
        Before parsing the XPath <strong>query</strong> in
        <strong>exoplatform</strong>, the statement is surrounded
    </span>
</div>

31.5.3. How to use it

If you are using XPath, you must use the rep:excerpt() function in the last location step, just like you would select properties:

QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("//*[jcr:contains(., 'exoplatform')]/(@Title|rep:excerpt(.))", Query.XPATH);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
   Row r = it.nextRow();
   Value title = r.getValue("Title");
   Value excerpt = r.getValue("rep:excerpt(.)");
}

The above code searches for nodes that contain the word exoplatform and then gets the value of the Title property and an excerpt for each result node.

It is also possible to use a relative path in the call Row.getValue() while the query statement still remains the same. Also, you may use a relative path to a string property. The returned value will then be an excerpt based on string value of the property.

Both available excerpt provider will create fragments of about 150 characters and up to 3 fragments.

In SQL, the function is called excerpt() without the rep prefix, but the column in the RowIterator will nonetheless be labled rep:excerpt(.)!

QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("select excerpt(.) from nt:resource where contains(., 'exoplatform')", Query.SQL);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
   Row r = it.nextRow();
   Value excerpt = r.getValue("rep:excerpt(.)");
}

31.6. SpellChecker

The lucene based query handler implementation supports a pluggable spell checker mechanism. By default, spell checking is not available and you have to configure it first. See parameter spellCheckerClass on page Search Configuration. JCR currently provides an implementation class , which uses the lucene-spellchecker to contribute . The dictionary is derived from the fulltext indexed content of the workspace and updated periodically. You can configure the refresh interval by picking one of the available inner classes of org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker:

OneMinuteRefreshInterval
FiveMinutesRefreshInterval
ThirtyMinutesRefreshInterval
OneHourRefreshInterval
SixHoursRefreshInterval
TwelveHoursRefreshInterval
OneDayRefreshInterval

For example, if you want a refresh interval of six hours, the class name is: org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$SixHoursRefreshInterval. If you use org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker, the refresh interval will be one hour.

The spell checker dictionary is stored as a lucene index under "index-dir"/spellchecker. If it does not exist, a background thread will create it on startup. Similarly, the dictionary refresh is also done in a background thread to not block regular queries.

31.6.1. How do I use it?

You can spell check a fulltext statement either with an XPath or a SQL query:

// rep:spellcheck('explatform') will always evaluate to true
Query query = qm.createQuery("/jcr:root[rep:spellcheck('explatform')]/(rep:spellcheck())", Query.XPATH);
RowIterator rows = query.execute().getRows();
// the above query will always return the root node no matter what string we check
Row r = rows.nextRow();
// get the result of the spell checking
Value v = r.getValue("rep:spellcheck()");
if (v == null) {
   // no suggestion returned, the spelling is correct or the spell checker
   // does not know how to correct it.
} else {
   String suggestion = v.getString();
}

And the same using SQL:

// SPELLCHECK('exoplatform') will always evaluate to true
Query query = qm.createQuery("SELECT rep:spellcheck() FROM nt:base WHERE jcr:path = '/' AND SPELLCHECK('explatform')", Query.SQL);
RowIterator rows = query.execute().getRows();
// the above query will always return the root node no matter what string we check
Row r = rows.nextRow();
// get the result of the spell checking
Value v = r.getValue("rep:spellcheck()");
if (v == null) {
   // no suggestion returned, the spelling is correct or the spell checker
   // does not know how to correct it.
} else {
   String suggestion = v.getString();
}

31.7. Similarity (Since 1.12)

Starting with version, 1.12 JCR allows you to search for nodes that are similar to an existing node.

Only terms with at least 4 characters are considered.
Only terms that occur at least 2 times in the source node are considered.
Only terms that occur in at least 5 nodes are considered.

<param name="support-highlighting" value="true"/>

The functions are called rep:similar() (in XPath) and similar() (in SQL) and have two arguments:

relativePath: a relative path to a descendant node or . for the current node. absoluteStringPath: a string literal that contains the path to the node for which to find similar nodes.

Warning

Relative path is not supported yet.

Examples:

//element(*, nt:resource)[rep:similar(., '/parentnode/node.txt/jcr:content')]

Finds nt:resource nodes, which are similar to node by path /parentnode/node.txt/jcr:content.