JBoss.orgCommunity Documentation
Each property of a node (if it is indexable) is processed with Lucene analyzer and stored in Lucene index. That's called indexing of a property. After that we can perform a fulltext search among these indexed properties.
The sense of analyzers is to transform all strings stored in the index in a well-defined condition. The same analyzer(s) is/are used when searching in order to adapt the query string to the index reality.
Therefore, performing the same query using different analyzers can return different results.
Now, let's see how the same string is transformed by different analyzers.
Table 32.1. "The quick brown fox jumped over the lazy dogs"
Analyzer | Parsed |
---|---|
org.apache.lucene.analysis.WhitespaceAnalyzer | [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
org.apache.lucene.analysis.SimpleAnalyzer | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
org.apache.lucene.analysis.StopAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
org.apache.lucene.analysis.standard.StandardAnalyzer | [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] |
org.apache.lucene.analysis.snowball.SnowballAnalyzer | [quick] [brown] [fox] [jump] [over] [lazi] [dog] |
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) | [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] |
Table 32.2. "XY&Z Corporation - xyz@example.com"
Analyzer | Parsed |
---|---|
org.apache.lucene.analysis.WhitespaceAnalyzer | [XY&Z] [Corporation] [-] [xyz@example.com] |
org.apache.lucene.analysis.SimpleAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
org.apache.lucene.analysis.StopAnalyzer | [xy] [z] [corporation] [xyz] [example] [com] |
org.apache.lucene.analysis.standard.StandardAnalyzer | [xy&z] [corporation] [xyz@example] [com] |
org.apache.lucene.analysis.snowball.SnowballAnalyzer | [xy&z] [corpor] [xyz@exampl] [com] |
org.apache.lucene.analysis.standard.StandardAnalyzer (configured without stop word - jcr default analyzer) | [xy&z] [corporation] [xyz@example] [com] |
StandardAnalyzer is the default analyzer in exo's jcr search engine. But we do not use stop words.
You can assign your analyzer as described in Search Configuration
Different properties are indexed in different ways, this affects to if it can be searched like fulltext by property or not.
Only two property types are indexed as fulltext searcheable: STRING and BINARY.
Table 32.3. Fulltext search by different properties
Property Type | Fulltext search by all properties | Fulltext search by exact property |
---|---|---|
STRING | YES | YES |
BINARY | YES | NO |
For example, ưe have property jcr:data (it' BINARY). It's stored well, but you will never find any string with query like:
SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, 'some string')
Because, BINARY is not searchable by fulltext search on exact property.
But, next query will return result (off course if node has searched data):
SELECT * FROM nt:resource WHERE CONTAINS( * , 'some string')
First of all, we will fill repository by nodes with mixin type 'mix:title' and different values of 'jcr:description' property.
root
document1 (mix:title) jcr:description = "The quick brown fox jumped over the lazy dogs"
document2 (mix:title) jcr:description = "Brown fox live in forest."
document3 (mix:title) jcr:description = "Fox is a nice animal."
Let's see analyzers effect closer. In first case, we use base jcr settings, so, as mentioned above, string "The quick brown fox jumped over the lazy dogs" will be transformed to set {[the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] }
// make SQL query QueryManager queryManager = workspace.getQueryManager(); String sqlStatement = "SELECT * FROM mix:title WHERE CONTAINS(jcr:description, 'the')"; // create query Query query = queryManager.createQuery(sqlStatement, Query.SQL); // execute query and fetch result QueryResult result = query.execute();
NodeIterator will return "document1".
Now change the default analyzer to org.apache.lucene.analysis.StopAnalyzer. Fill repository again (new Analyzer must process nodes properties) and run the same query again. It will return nothing, because stop words like "the" will be excluded from parsed string set.