Lucene - ModeShape 5

Overview

The Lucene index provider allows storing indexes and executing queries using Lucene 5. This index provider supports all 4 index types, where each index kind may be limited to handling only a subset of JCR Query constraints and operands as follows:

Index Kind	Supported Features	Unsupported Features	Multi-column support
Value	most constraints and operands, including `LIKE`; both single and multi-valued properties	`JOIN` and `FULL TEXT SEARCH` constraints	yes
Enumerated	same as `Value` indexes	same as `Value` indexes	yes
Unique	same as `Value` indexes	same as `Value` indexes	yes
Nodetype	same as `Value` indexes	same as `Value` indexes	no
Text	`FULL TEXT SEARCH` constraint	any other JCR query operands and constraints	no

Even though most index types support multiple columns in the index definitions, it is not recommended to define multiple columns per index.
This is because Lucene does not support merging of Documents and therefore any index update operation requires performing the merge in-memory and then overwriting the existing Document, incurring a significant performance penalty

Configuration

There are several Lucene-related attributes that can be configured as follows:

Attribute	Description	Optional	Default value
`directory`	the path on disk where indexes should be stored	yes if `path` and `relativeTo` are present
`path`	a relative path to the `relativeTo` attribute	yes if `directory` is present
`relativeTo`	the folder relative to which `path` is resolved	yes if `directory` is present
`directoryClass`	the Lucene directory class type	yes	`FSDirectory.open`
`analyzerClass`	the Lucene analyzer instance	yes	`StandardAnalyzer`
`lockFactory`	the Lucene lock factory instance	yes	`FSLockFactory.getDefault()`
`codec`	the Lucene codec instance	yes	`Codec.getDefault()`

JSON

The standard JSON configuration looks like this:

 "indexProviders" : {
        "lucene" : {
            "classname" : "lucene",
            "directory" : "target/indexes"
        },

while the advanced configuration looks like:

"indexProviders" : {
        "lucene" : {
            "classname" : "lucene",
            "lockFactoryClass" : "org.apache.lucene.store.NoLockFactory",
            "directoryClass" : "org.apache.lucene.store.RAMDirectory",
            "analyzerClass" : "org.apache.lucene.analysis.ro.RomanianAnalyzer",
            "codec" : "Lucene53"
        }
    },

In either case, you need to make sure the index provider artifact is present in your classpath:

  <dependency>
    <groupId>org.modeshape</groupId>
    <artifactId>modeshape-lucene-index-provider</artifactId>
  </dependency>

JBoss AS

<index-provider name="lucene" classname="lucene" path="modeshape/artifacts/indexes/" relative-to="jboss.server.data.dir" module="org.modeshape.index-provider.lucene"/>

or

<index-provider name="lucene" classname="lucene" module="org.modeshape.index-provider.lucene" 
                lockFactoryClass="org.apache.lucene.store.NoLockFactory" directoryClass="org.apache.lucene.store.RAMDirectory" analyzerClass="org.apache.lucene.analysis.ro.RomanianAnalyzer"
                codec="Lucene53"/>

Make sure you set the module="org.modeshape.index-provider.lucene" attribute, without which the Lucene index provider cannot be loaded by the server

Additional considerations

When running full text search queries via the CONTAINS keyword, JCR allows the property name to be optional. For example:

"select [jcr:path] from [nt:testType] as n where contains(n.*,'the quick Dog')"

as opposed to

"select [jcr:path] from [nt:testType] as n where contains(FTSProp,'the quick Dog')"

If you're using the first style of querying, make sure there is only one text index which applies to [nt:testType], otherwise your query will not return any results. If you want to have multiple text indexes for the same node type, make sure that the full text search queries fully define the property name (as in the second example).

If you're using text extactors and want to be able to full-text-search the content of [nt:file] nodes, define your text index as:

 "textFromFiles" : {
    "kind" : "text",
    "provider" : "lucene",
    "nodeType" : "nt:resource",
    "columns" : "jcr:data(BINARY)"
  }