jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "fabrizio giustina (JIRA)" <j...@apache.org>
Subject [jira] Updated: (JCR-2622) Index analizers that extends StandardAnalyzer need to implement reusableTokenStream() since jackrabbit 2.1
Date Sat, 01 Jan 2011 19:51:46 GMT

     [ https://issues.apache.org/jira/browse/JCR-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

fabrizio giustina updated JCR-2622:
-----------------------------------

    Summary: Index analizers that extends StandardAnalyzer need to implement reusableTokenStream()
since jackrabbit 2.1  (was: Configured index analizers not working in jackrabbit 2.1 and 2.2)

Looks like I spoke too soon, after a deeper analysis I found out the problem can be fixed
in the analyzer class and doesn't require a fix in jackrabbit itself.

The change in JCR-2505 actually broke index analyzers that don't implement the reusableTokenStream()
method properly: 
any analyzer that extends org.apache.lucene.analysis.standard.StandardAnalyzer was working
properly in jackrabbit 2.0 which was using the tokenStream() method only. But since jackrabbit
2.1 such analizers cannot rely on the superclass implementation of reusableTokenStream() and
they have to implement such method properly.

The correct solution is probably not to extends StandardAnalyzer anymore (the reusableTokenStream
method is not ovveraidable due to the usage private fields) but to extend a plain org.apache.lucene.analysis.Analyzer
and reimplement the tokenStream method from scratch.

So the problem looks like a but in all the analyzers I was using, but in a part that has never
been used by jackrabbit before the change in version 2.1... the issue can be closed


> Index analizers that extends StandardAnalyzer need to implement reusableTokenStream()
since jackrabbit 2.1
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JCR-2622
>                 URL: https://issues.apache.org/jira/browse/JCR-2622
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0
>            Reporter: fabrizio giustina
>            Priority: Critical
>         Attachments: JCR-2622-tests_and_patch.diff
>
>
> I just tried migrating an existing project which was using jackrabbit 2.0.0 to 2.1.0.
> We have an index analyzer configured which filters accented chars: 
> {code}
> public class ItalianSnowballAnalyzer extends StandardAnalyzer
> {
>     @Override
>     public TokenStream tokenStream(String fieldName, Reader reader)
>     {
>         return new ISOLatin1AccentFilter(new LowerCaseFilter((super.tokenStream(fieldName,
reader))));
>     }
> }
> {code}
> The project has a good number of unit tests, an xml is loaded in a memory-only jackrabbit
repository and several queries are checked against expected results.
> After migrating to 2.1.0 none of the tests that relied on the Index analizer work anymore,
for example searching for "test" doesn't find anymore nodes containing "t├Ęst".
> Upgrading to jackrabbit 2.1.0 is the only change done (no changes in the configuration/code
or other libraries at all). Rolling back to the 2.0.0 dependency is enough to make all the
tests working again.
> I've checked the changes in 2.1 but I couldn't find any apparently related change. Also
note that I was already using the patch in JCR-2504 also before (configuration loading works
fine in the unpatched 2.1). Another point is that the configured IndexAnalyzer still gets
actually called during our tests (checked in debug mode).
> Any idea?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message