jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers (JIRA)" <j...@apache.org>
Subject [jira] Created: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers
Date Thu, 23 Aug 2007 08:50:30 GMT
Extend the IndexingConfiguration to allow configuration of reuseable analyzers
------------------------------------------------------------------------------

                 Key: JCR-1079
                 URL: https://issues.apache.org/jira/browse/JCR-1079
             Project: Jackrabbit
          Issue Type: New Feature
    Affects Versions: 1.3.1
            Reporter: Ard Schrijvers
            Priority: Minor
             Fix For: 1.4


To the indexing_configuration.xml a xml block of analyzers should be configurable. In each
<index-rule> to a property an analyzer can be assigned. This means, that property will
be analyzed with that specific analyzer. In the first place, it enables multilingual indexing.


Documentation needs to be added explaining the difference in searching in the node scope [jcr:contains(.,'foo')]
and in some property [jcr:contains(@myprop,'foo')]. The node scope will always be searched
and indexed with the default analyzer, which can be configured in the workspace.xml in  the
 <SearchIndex> element.

Below a possible indexing_configuration.xml snippet is shown. Also node the possible enhancement
(not sure wether this implementation will have it, because it requires a lot of filter Factories
and is probably out of scope). Adding custom filters which do not need a factory might be
easier.

<analyzers>
	<analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
	<analyzer name="de" class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
        <analyzer name="compound" class="org.apache.lucene.analysis.SimpleAnalyzer">
             <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
             <filter class="jr.EdgeNGramTokenizerFactory" side="front" minGram="1" maxGram="2"/>
        </analyzer>
</analyzers>

<index-rule nodeType="nt:unstructured">
       <property analyzer="fr">bode_fr</property>
       <property analyzer="de">bode_de</property>
</index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message