lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Male <gento...@gmail.com>
Subject Re: Lucene 4.0 PerFieldAnalyzerWrapper question
Date Wed, 26 Sep 2012 01:31:33 GMT
Mike,

On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary <tmoleary@uw.edu> wrote:

> Hi Chris,
> So if I change my analyzer to inherit from AnalyzerWrapper, I need to
> define a getWrappedAnalyzer function and a wrapComponents function. I think
> getWrappedAnalyzer is straightforward, but I don't understand who is
> calling wrapComponents and for what purpose, so I don't know how to define
> it. This is my modified analyzer code with ??? in the places I don't know
> how to define.
> Thanks,
> Mike
>
> public class MyPerFieldAnalyzer extends AnalyzerWrapper {
>   Map<String, Analyzer> _analyzerMap = new HashMap<String,  Analyzer>();
>   Analyzer _defaultAnalyzer;
>
>   public MyPerFieldAnalyzer() {
>     _analyzerMap.put("IDNumber", new KeywordAnalyzer());
>     ...
>     ...
>
>     _defaultAnalyzer = new CustomAnalyzer();
>   }
>
>   @Override
>   protected Analyzer getWrappedAnalyzer(String fieldName) {
>     Analyzer analyzer;
>
>     if (analyzerMap.containsKey(fieldName) {
>       analyzer = analyzerMap.get(fieldName);
>     } else {
>       analyzer = defaultAnalyzer;
>     }
>   }
>

I'm not sure if you have missed it but PerFieldAnalyzerWrapper supports
having a default Analyzer.


>
>   @Override
>   public TokenStreamComponents wrapComponents(String fieldname,
>  TokenStreamComponents components) {
>     Tokenizer tokenizer = ???;
>     TokenStream tokenStream = ???;
>     return new TokenStreamComponents(tokenizer, tokenStream);
>   }
> }
>

wrapComponents is useful for when you need to change the components
retrieved from the wrapped Analyzer.  Adding a new Tokenizer or TokenFilter
for example.  But you don't need to do this, and can just return the
components parameter unchanged.


>
> -----Original Message-----
> From: Chris Male [mailto:gento0nz@gmail.com]
> Sent: Tuesday, September 25, 2012 5:34 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
>
> Ah I see.
>
> The problem is that we don't really encourage wrapping of Analyzers.  Your
> Analyzer wraps a PerFieldAnalyzerWrapper consequently it needs to extend
> AnalyzerWrapper, not Analyzer.  AnalyzerWrapper handles the
> createComponents call and just requires you to give it the Analyzer(s)
> you've wrapped through getWrappedAnalyzer.
>
> You can avoid all this entirely of course by not extending Analyzer but
> instead just instantiating a PerFieldAnalyerWrapper instance directly
> instead of your MyPerFieldAnalyzer.
>
> On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary <tmoleary@uw.edu> wrote:
>
> > Hi Chris,
> > In a nutshell, my question is, what should I put in place of ??? to
> > make this into a Lucene 4.0 analyzer?
> >
> > public class MyPerFieldAnalyzer extends Analyzer {
> >   PerFieldAnalyzerWrapper _analyzer;
> >
> >   public MyPerFieldAnalyzer() {
> >     Map<String, Analyzer> analyzerMap = new HashMap<String,
> > Analyzer>();
> >
> >     analyzerMap.put("IDNumber", new KeywordAnalyzer());
> >     ...
> >     ...
> >
> >     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(),
> > analyzerMap);
> >   }
> >
> >   @Override
> >   public TokenStreamComponents createComponents(String fieldname,
> > Reader
> > reader) {
> >     Tokenizer source = ???;
> >     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
> >     return new TokenStreamComponents(source, stream);
> >   }
> > }
> >
> > I must be missing something obvious. Can you tell me what it is?
> > Thanks,
> > Mike
> >
> > -----Original Message-----
> > From: Chris Male [mailto:gento0nz@gmail.com]
> > Sent: Tuesday, September 25, 2012 5:18 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
> >
> > Hi Mike,
> >
> > I don't really understand what problem you're having.
> >
> > PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses
> > Analyzer.PerFieldReuseStrategy which means it caches the
> > TokenStreamComponents per field.  The TokenStreamComponents cached are
> > created by by retrieving the wrapped Analyzer through
> > AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents.
> >  In PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer
> > from the Map you provide.
> >
> > Consequently to use your custom Analyzers and KeywordAnalyzer, all you
> > need to do is define your custom Analyzer using the new Analyzer API
> > (that is using TokenStreamComponents), create your Map from that
> > Analyzer and KeywordAnalyzer and pass it into PerFieldAnalyzerWrapper.
> > This seems to be what you're doing in your code sample.
> >
> > Are you able to expand on the problem you're encountering?
> >
> > On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <tmoleary@uw.edu> wrote:
> >
> > > I am updating an analyzer that uses a particular configuration of
> > > the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the
> > > fields use a custom analyzer and StandardTokenizer and the other
> > > fields use the KeywordAnalyzer and KeywordTokenizer. The older
> > > version of the analyzer looks like this:
> > >
> > > public class MyPerFieldAnalyzer extends Analyzer {
> > >   PerFieldAnalyzerWrapper _analyzer;
> > >
> > >   public MyPerFieldAnalyzer() {
> > >     Map<String, Analyzer> analyzerMap = new HashMap<String,
> > > Analyzer>();
> > >
> > >     analyzerMap.put("IDNumber", new KeywordAnalyzer());
> > >     ...
> > >     ...
> > >
> > >     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(),
> > > analyzerMap);
> > >   }
> > >
> > >   @Override
> > >   public TokenStream tokenStream(String fieldname, Reader reader) {
> > >     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
> > >     return stream;
> > >   }
> > > }
> > >
> > > In older versions of Lucene it is necessary to define a tokenStream
> > > function, but in 4.0 it is not (in fact, TokenStream is declared
> > > final, so you can't). Instead, it is necessary to define a
> > > createComponents function that takes the same arguments as the
> > > tokenStream function and returns a TokenStreamComponents object. The
> > > TokenStreamComponents constructor has a Tokenizer argument and a
> > > TokenStream argument. I assume I can just use the same code to
> > > provide the TokenStream object as was used in the older analyzer's
> > > tokenStream function, but I don't see how to provide a Tokenizer
> > > object, unless it is by creating a separate map of field names to
> > > Tokenizers that works the same way the analyzer map does. Is that
> > > the best way to do this, or is there a better way? For example,
> > > would it be better to inherit from AnalyzerWrapper instead of from
> > > Analyzer? In that case I would need to define getWrappedAnalyzer and
> > > wrappedComponents functions. I think in that case I would still need
> > > to put the same kind of logic in the wrapComponents function that
> > > specifies which tokenizer to use with which field, though. It looks
> > > like the PerFieldAnalyzerWrapper itself assumes that the same
> > > tokenizer will be used with all fields, as its wrapComponents
> > > function ignores the fieldname parameter. I would appreciate any
> > > help in finding out the best way to update this analyzer
> > and to write the required function(s).
> >
> > Thanks,
> > > Mike
> > >
> >
> >
> >
> > --
> > Chris Male | Open Source Search Developer | elasticsearch | www.e<
> > http://www.dutchworks.nl> lasticsearch.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Chris Male | Open Source Search Developer | elasticsearch | www.e<
> http://www.dutchworks.nl> lasticsearch.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.e<http://www.dutchworks.nl>
lasticsearch.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message