lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Schlansker <ste...@likeness.com>
Subject Re: Seemingly very difficult to wrap an Analyzer with CharFilter
Date Fri, 14 Jun 2013 17:01:01 GMT

On Jun 12, 2013, at 5:26 PM, Michael Sokolov <msokolov@safaribooksonline.com> wrote:

> On 6/12/2013 7:02 PM, Steven Schlansker wrote:
>> On Jun 12, 2013, at 3:44 PM, Michael Sokolov <msokolov@safaribooksonline.com>
wrote:
>> 
>>> You may not have noticed that CharFilter extends Reader.  The expected pattern
here is that you chain instances together -- your CharFilter should act as *input* to the
Analyzer, I think.  Don't think in terms of extending these analysis classes (except the base
ones designed for it): compose them so that each consumes the one before it
>>> 
>> Hi Mike,
>> 
>> Hm, that may work out.  I am a little surprised because I thought the intention is
that you set the Analyzer up as part of the configuration, and when you add documents, the
analyzer takes care of all text processing.  In particular this means that now I have to ensure
that the same transformation is done at query time, and I thought the analyzer abstraction
was supposed to avoid this.
>> 
>> But if this is how it should be done, it could work.  Thanks for the pointer.
>> 
>> Steven
>> 
>> 
> Um I'm sorry I was in a hurry and forgot to think... I went back and looked at my code
and found the pattern was different from what I was thinking.  I have:
> 
> public final class DefaultAnalyzer extends Analyzer {
> 
>    @Override
>    protected TokenStreamComponents createComponents(String fieldName, Reader reader)
{
>        Tokenizer tokenizer = new StandardTokenizer(IndexConfiguration.LUCENE_VERSION,
reader);
>        TokenStream tokenStream =  new LowerCaseFilter(IndexConfiguration.LUCENE_VERSION,
tokenizer);
>        // ASCIIFoldingFilter
>        // Stemming
>        return new TokenStreamComponents(tokenizer, tokenStream);
>    }
> 
> }
> 
> You were exactly right that subclassing Analyzer and overriding the initReader is the
way to go.
> The composition I was talking about can happen among filters.  I guess you have to duplicate
the internals of StandardAnalyzer, but I don't think there's all that much in there?

You are right, it is not that hard.  It is only that my goal was to have "a StandardAnalyzer
with a CharFilter" and I hate unnecessarily duplicating code :-)

But it seems that this is my only course of action.

> 
> I used AnalyzerWrapper for something -- um switching between multiple analyzers based
on the input.  But it doesn't allow you to do anything with the internals of the analyzer(s)
it wraps.

Yeah, this is a little unfortunate.  Just being able to override initReader would be nice.

Thanks for the pointers,
Steven


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message