JBoss.orgCommunity Documentation

Chapter 31. Searching Repository Content

31.1. Introduction
31.2. Bi-directional RangeIterator (since 1.9)
31.3. Fuzzy Searches (since 1.0)
31.4. SynonymSearch (since 1.9)
31.5. High-lighting (Since 1.9)
31.5.1. DefaultXMLExcerpt
31.5.2. DefaultHTMLExcerpt
31.5.3. How to use it
31.6. SpellChecker
31.6.1. How do I use it?
31.7. Similarity (Since 1.12)

You can find the JCR configuration file here: .../portal/WEB-INF/conf/jcr/repository-configuration.xml. Please read also Search Configuration for more information about index configuration.

QueryResult.getNodes() will return bi-directional NodeIterator implementation.

TwoWayRangeIterator interface:

/**
 * Skip a number of elements in the iterator.
 * 
 * @param skipNum the non-negative number of elements to skip
 * @throws java.util.NoSuchElementException if skipped past the first element
 *           in the iterator.
 */
public void skipBack(long skipNum);

Usage:

NodeIterator iter = queryResult.getNodes();
while (iter.hasNext()) {
  if (skipForward) {
    iter.skip(10); // Skip 10 nodes in forward direction
  } else if (skipBack) {
    TwoWayRangeIterator backIter = (TwoWayRangeIterator) iter; 
    backIter.skipBack(10); // Skip 10 nodes back 
  }
  .......
}

JCR supports such features as Lucene Fuzzy Searches Apache Lucene - Query Parser Syntax.

To use it, you have to form a query like the one described below:

QueryManager qman = session.getWorkspace().getQueryManager();
Query q = qman.createQuery("select * from nt:base where contains(field, 'ccccc~')", Query.SQL);
QueryResult res = q.execute();

Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in Google. If a search term is prefixed by a tilde symbol ( ~ ), also synonyms of the search term are taken into consideration. For example:

SQL: select * from nt:resource where contains(., '~parameter')

XPath: //element(*, nt:resource)[jcr:contains(., '~parameter')

This feature is disabled by default and you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param  name="synonymprovider-config-path" value="..you path to configuration file....."/>
<param  name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"/>
/**
 * <code>SynonymProvider</code> defines an interface for a component that
 * returns synonyms for a given term.
 */
public interface SynonymProvider {

   /**
    * Initializes the synonym provider and passes the file system resource to
    * the synonym provider configuration defined by the configuration value of
    * the <code>synonymProviderConfigPath</code> parameter. The resource may be
    * <code>null</code> if the configuration parameter is not set.
    *
    * @param fsr the file system resource to the synonym provider
    *            configuration.
    * @throws IOException if an error occurs while initializing the synonym
    *                     provider.
    */
   public void initialize(InputStream fsr) throws IOException;

   /**
    * Returns an array of terms that are considered synonyms for the given
    * <code>term</code>.
    *
    * @param term a search term.
    * @return an array of synonyms for the given <code>term</code> or an empty
    *         array if no synonyms are known.
    */
   public String[] getSynonyms(String term);
}

An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.

By default highlighting words matched the query is disabled because this feature requires that additional information is written to the search index. To enable this feature, you need to add a configuration parameter to the query-handler element in your jcr configuration file to enable it.

<param name="support-highlighting" value="true"/>

Additionally, there is a parameter that controls the format of the excerpt created. In JCR 1.9, the default is set to org.exoplatform.services.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configuration parameter for this setting is:

<param name="excerptprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.DefaultXMLExcerpt"/>

The lucene based query handler implementation supports a pluggable spell checker mechanism. By default, spell checking is not available and you have to configure it first. See parameter spellCheckerClass on page Search Configuration. JCR currently provides an implementation class , which uses the lucene-spellchecker to contribute . The dictionary is derived from the fulltext indexed content of the workspace and updated periodically. You can configure the refresh interval by picking one of the available inner classes of org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker:

  • OneMinuteRefreshInterval

  • FiveMinutesRefreshInterval

  • ThirtyMinutesRefreshInterval

  • OneHourRefreshInterval

  • SixHoursRefreshInterval

  • TwelveHoursRefreshInterval

  • OneDayRefreshInterval

For example, if you want a refresh interval of six hours, the class name is: org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker$SixHoursRefreshInterval. If you use org.exoplatform.services.jcr.impl.core.query.lucene.spell.LuceneSpellChecker, the refresh interval will be one hour.

The spell checker dictionary is stored as a lucene index under "index-dir"/spellchecker. If it does not exist, a background thread will create it on startup. Similarly, the dictionary refresh is also done in a background thread to not block regular queries.

Starting with version, 1.12 JCR allows you to search for nodes that are similar to an existing node.

Similarity is determined by looking up terms that are common to nodes. There are some conditions that must be met for a term to be considered. This is required to limit the number possibly relevant terms.

Note: The similarity functionality requires that the support Hightlighting is enabled. Please make sure that you have the following parameter set for the query handler in your workspace.xml.

<param name="support-highlighting" value="true"/>

The functions are called rep:similar() (in XPath) and similar() (in SQL) and have two arguments:

relativePath: a relative path to a descendant node or . for the current node. absoluteStringPath: a string literal that contains the path to the node for which to find similar nodes.

Examples:

//element(*, nt:resource)[rep:similar(., '/parentnode/node.txt/jcr:content')]

Finds nt:resource nodes, which are similar to node by path /parentnode/node.txt/jcr:content.