jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "IndexingConfiguration" by ardschrijvers
Date Mon, 10 Sep 2007 08:42:02 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by ardschrijvers:
http://wiki.apache.org/jackrabbit/IndexingConfiguration

------------------------------------------------------------------------------
  </configuration>
  }}}
  
+ === Index Analyzers ===
+ 
+ With this configuration part, you define how a property should be analysed. If a property
has an analyzer configured, this analyzer is used for indexing and searching this property.
For example:
+ 
+ {{{
+ <?xml version="1.0"?>
+ <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
+ <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+   <analyzers> 
+         <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
+             <property>mytext</property>
+         </analyzer>
+         <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
+             <property>mytext2</property>
+         </analyzer>
+   </analyzers> 
+ </configuration>
+ }}}
+ 
+ The configuration above means that the property "mytext" for the entire workspace is indexed
(ans searched) with the lucene KeywordAnalyzer, and property "mytext2" with WhitespaceAnalyzer.
Using different analyzers for different languages is specifically useful.
+ 
+ Though, when using analyzers, you may find unexpected behavior when searching within a property
compared to searching within a node scope: 
+ When your query is for example:
+ 
+ {{{
+ xpath = "//*[jcr:contains(mytext,'analyzer')]"
+ }}}
+ 
+ and the property "mytext" contained the text : "testing my analyzers". 
+ 
+ Now, when not having configured any analyzers for the property "mytext", this xpath does
not return a hit in the node with the property above. Also xpath = "//*[jcr:contains(.,'analyzer')]",
won't give a hit. Realize, that you can only set specific analyzers on a node property, and
that the node scope indexing/analyzing always is done with the globally defined analyzer in
SearchIndex element. Now, when I would change the analyzer used to indexed the "mytext" property
above to 
+ 
+ {{{
+ <analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
+      <property>mytext</property>
+ </analyzer>
+ }}}
+ 
+ and I would do the same search again, then for {{{xpath = "//*[jcr:contains(mytext,'analyzer')]"}}}
I would find a hit because of stemming! The other search, {{{xpath = "//*[jcr:contains(.,'analyzer')]"}}}
still would not give a result, since the node scope is indexed with the global analyzer, which
in this case did not do stemming. 
+ 
+ So, realize that when using analyzers for specific properties, you might find a hit in a
property for some search text, and you do not find a hit with the same search text in the
node scope of the property!
+ 
+ 
  '''Important note''': Both index rules and index aggregates influence how content is indexed
in Jackrabbit. If you change the configuration the existing content is not automatically re-indexed
according to the new rules. You therefore have to manually re-index the content when you change
the configuration!
  

Mime
View raw message