jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers
Date Tue, 28 Aug 2007 09:47:30 GMT

    [ https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523206
] 

Ard Schrijvers commented on JCR-1079:
-------------------------------------

> That's OK with me, but I think being able to configure an analyzer in an index rule also
seems useful to me.

That is fine with me, but we do have to realize, that I cannot make a distinction between
setting it for a property in a single "index-rule" or setting it global like I did describe
it. It is because when analyzing or when parsing some query for a field, all I know in the
analyzer is the  the string representation (JCR-style name) of the given property. 

If that is fine with you I will add this configuration option and write documentation about
it.

> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
>                 Key: JCR-1079
>                 URL: https://issues.apache.org/jira/browse/JCR-1079
>             Project: Jackrabbit
>          Issue Type: New Feature
>    Affects Versions: 1.3.1
>            Reporter: Ard Schrijvers
>            Priority: Minor
>             Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be configurable. In
each <index-rule> to a property an analyzer can be assigned. This means, that property
will be analyzed with that specific analyzer. In the first place, it enables multilingual
indexing. 
> Documentation needs to be added explaining the difference in searching in the node scope
[jcr:contains(.,'foo')] and in some property [jcr:contains(@myprop,'foo')]. The node scope
will always be searched and indexed with the default analyzer, which can be configured in
the workspace.xml in  the  <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the possible
enhancement (not sure wether this implementation will have it, because it requires a lot of
filter Factories and is probably out of scope). Adding custom filters which do not need a
factory might be easier.
> <analyzers>
> 	<analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
> 	<analyzer name="de" class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>         <analyzer name="compound" class="org.apache.lucene.analysis.SimpleAnalyzer">
>              <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
>              <filter class="jr.EdgeNGramTokenizerFactory" side="front" minGram="1"
maxGram="2"/>
>         </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
>        <property analyzer="fr">bode_fr</property>
>        <property analyzer="de">bode_de</property>
> </index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message