jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers
Date Mon, 03 Sep 2007 07:19:19 GMT

    [ https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524452
] 

Ard Schrijvers commented on JCR-1079:
-------------------------------------

>no, I think it's better to have it symmetric. If they can only be used globally then you
should only be allowed to configure them globally.

Also IMO this is best, because it might be very confusing. The reason why it can be only globally
configured is in the tokenStream part of the analyzer:

public TokenStream tokenStream(String fieldName, Reader reader) {

This one is used for indexing *and* parsing for searching, and the only thing I can distinguish
on is the String fieldName (string representation of the QName). There is no way to know which
indexing-rule it holds for, hence, the global configuration. 



> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
>                 Key: JCR-1079
>                 URL: https://issues.apache.org/jira/browse/JCR-1079
>             Project: Jackrabbit
>          Issue Type: New Feature
>    Affects Versions: 1.3.1
>            Reporter: Ard Schrijvers
>            Priority: Minor
>             Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be configurable. In
each <index-rule> to a property an analyzer can be assigned. This means, that property
will be analyzed with that specific analyzer. In the first place, it enables multilingual
indexing. 
> Documentation needs to be added explaining the difference in searching in the node scope
[jcr:contains(.,'foo')] and in some property [jcr:contains(@myprop,'foo')]. The node scope
will always be searched and indexed with the default analyzer, which can be configured in
the workspace.xml in  the  <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the possible
enhancement (not sure wether this implementation will have it, because it requires a lot of
filter Factories and is probably out of scope). Adding custom filters which do not need a
factory might be easier.
> <analyzers>
> 	<analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
> 	<analyzer name="de" class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>         <analyzer name="compound" class="org.apache.lucene.analysis.SimpleAnalyzer">
>              <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
>              <filter class="jr.EdgeNGramTokenizerFactory" side="front" minGram="1"
maxGram="2"/>
>         </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
>        <property analyzer="fr">bode_fr</property>
>        <property analyzer="de">bode_de</property>
> </index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message