lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike O'Leary" <tmole...@uw.edu>
Subject RE: Lucene 4.0 PerFieldAnalyzerWrapper question
Date Wed, 26 Sep 2012 17:55:32 GMT
Hi Chris,
So it sounds like instead of defining a new class that gets instantiated to create an analyzer,
I could just do this:

public class MyPerFieldAnalyzer {
  public static Analyzer getMyPerFieldAnalyzer() {
    Map<String, Analyzer> analyzerMap = new HashMap<String,  Analyzer>();

    analyzerMap.put("IDNumber", new KeywordAnalyzer());
    ...
    ...

    return new PerFieldAnalyzerWrapper(new CustomAnalyzer(), analyzerMap) ;
  }
}

Which is much simpler than all of the things I was thinking I would need to do.
Thanks very much,
Mike

-----Original Message-----
From: Chris Male [mailto:gento0nz@gmail.com] 
Sent: Tuesday, September 25, 2012 6:32 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question

Mike,

On Wed, Sep 26, 2012 at 1:05 PM, Mike O'Leary <tmoleary@uw.edu> wrote:

> Hi Chris,
> So if I change my analyzer to inherit from AnalyzerWrapper, I need to 
> define a getWrappedAnalyzer function and a wrapComponents function. I 
> think getWrappedAnalyzer is straightforward, but I don't understand 
> who is calling wrapComponents and for what purpose, so I don't know 
> how to define it. This is my modified analyzer code with ??? in the 
> places I don't know how to define.
> Thanks,
> Mike
>
> public class MyPerFieldAnalyzer extends AnalyzerWrapper {
>   Map<String, Analyzer> _analyzerMap = new HashMap<String,  Analyzer>();
>   Analyzer _defaultAnalyzer;
>
>   public MyPerFieldAnalyzer() {
>     _analyzerMap.put("IDNumber", new KeywordAnalyzer());
>     ...
>     ...
>
>     _defaultAnalyzer = new CustomAnalyzer();
>   }
>
>   @Override
>   protected Analyzer getWrappedAnalyzer(String fieldName) {
>     Analyzer analyzer;
>
>     if (analyzerMap.containsKey(fieldName) {
>       analyzer = analyzerMap.get(fieldName);
>     } else {
>       analyzer = defaultAnalyzer;
>     }
>   }
>

I'm not sure if you have missed it but PerFieldAnalyzerWrapper supports having a default Analyzer.


>
>   @Override
>   public TokenStreamComponents wrapComponents(String fieldname,  
> TokenStreamComponents components) {
>     Tokenizer tokenizer = ???;
>     TokenStream tokenStream = ???;
>     return new TokenStreamComponents(tokenizer, tokenStream);
>   }
> }
>

wrapComponents is useful for when you need to change the components retrieved from the wrapped
Analyzer.  Adding a new Tokenizer or TokenFilter for example.  But you don't need to do this,
and can just return the components parameter unchanged.


>
> -----Original Message-----
> From: Chris Male [mailto:gento0nz@gmail.com]
> Sent: Tuesday, September 25, 2012 5:34 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
>
> Ah I see.
>
> The problem is that we don't really encourage wrapping of Analyzers.  
> Your Analyzer wraps a PerFieldAnalyzerWrapper consequently it needs to 
> extend AnalyzerWrapper, not Analyzer.  AnalyzerWrapper handles the 
> createComponents call and just requires you to give it the Analyzer(s) 
> you've wrapped through getWrappedAnalyzer.
>
> You can avoid all this entirely of course by not extending Analyzer 
> but instead just instantiating a PerFieldAnalyerWrapper instance 
> directly instead of your MyPerFieldAnalyzer.
>
> On Wed, Sep 26, 2012 at 12:25 PM, Mike O'Leary <tmoleary@uw.edu> wrote:
>
> > Hi Chris,
> > In a nutshell, my question is, what should I put in place of ??? to 
> > make this into a Lucene 4.0 analyzer?
> >
> > public class MyPerFieldAnalyzer extends Analyzer {
> >   PerFieldAnalyzerWrapper _analyzer;
> >
> >   public MyPerFieldAnalyzer() {
> >     Map<String, Analyzer> analyzerMap = new HashMap<String,
> > Analyzer>();
> >
> >     analyzerMap.put("IDNumber", new KeywordAnalyzer());
> >     ...
> >     ...
> >
> >     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), 
> > analyzerMap);
> >   }
> >
> >   @Override
> >   public TokenStreamComponents createComponents(String fieldname, 
> > Reader
> > reader) {
> >     Tokenizer source = ???;
> >     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
> >     return new TokenStreamComponents(source, stream);
> >   }
> > }
> >
> > I must be missing something obvious. Can you tell me what it is?
> > Thanks,
> > Mike
> >
> > -----Original Message-----
> > From: Chris Male [mailto:gento0nz@gmail.com]
> > Sent: Tuesday, September 25, 2012 5:18 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene 4.0 PerFieldAnalyzerWrapper question
> >
> > Hi Mike,
> >
> > I don't really understand what problem you're having.
> >
> > PerFieldAnalyzerWrapper, like all AnalyzerWrappers, uses 
> > Analyzer.PerFieldReuseStrategy which means it caches the 
> > TokenStreamComponents per field.  The TokenStreamComponents cached 
> > are created by by retrieving the wrapped Analyzer through
> > AnalyzerWrapper.getWrappedAnalyzer(Field) and calling createComponents.
> >  In PerFieldAnalyzerWrapper, getWrappedAnalyzer pulls the Analyzer 
> > from the Map you provide.
> >
> > Consequently to use your custom Analyzers and KeywordAnalyzer, all 
> > you need to do is define your custom Analyzer using the new Analyzer 
> > API (that is using TokenStreamComponents), create your Map from that 
> > Analyzer and KeywordAnalyzer and pass it into PerFieldAnalyzerWrapper.
> > This seems to be what you're doing in your code sample.
> >
> > Are you able to expand on the problem you're encountering?
> >
> > On Wed, Sep 26, 2012 at 11:57 AM, Mike O'Leary <tmoleary@uw.edu> wrote:
> >
> > > I am updating an analyzer that uses a particular configuration of 
> > > the PerFieldAnalyzerWrapper to work with Lucene 4.0. A few of the 
> > > fields use a custom analyzer and StandardTokenizer and the other 
> > > fields use the KeywordAnalyzer and KeywordTokenizer. The older 
> > > version of the analyzer looks like this:
> > >
> > > public class MyPerFieldAnalyzer extends Analyzer {
> > >   PerFieldAnalyzerWrapper _analyzer;
> > >
> > >   public MyPerFieldAnalyzer() {
> > >     Map<String, Analyzer> analyzerMap = new HashMap<String,
> > > Analyzer>();
> > >
> > >     analyzerMap.put("IDNumber", new KeywordAnalyzer());
> > >     ...
> > >     ...
> > >
> > >     _analyzer = new PerFieldAnalyzerWrapper(new CustomAnalyzer(), 
> > > analyzerMap);
> > >   }
> > >
> > >   @Override
> > >   public TokenStream tokenStream(String fieldname, Reader reader) {
> > >     TokenStream stream = _analyzer.tokenStream(fieldname, reader);
> > >     return stream;
> > >   }
> > > }
> > >
> > > In older versions of Lucene it is necessary to define a 
> > > tokenStream function, but in 4.0 it is not (in fact, TokenStream 
> > > is declared final, so you can't). Instead, it is necessary to 
> > > define a createComponents function that takes the same arguments 
> > > as the tokenStream function and returns a TokenStreamComponents 
> > > object. The TokenStreamComponents constructor has a Tokenizer 
> > > argument and a TokenStream argument. I assume I can just use the 
> > > same code to provide the TokenStream object as was used in the 
> > > older analyzer's tokenStream function, but I don't see how to 
> > > provide a Tokenizer object, unless it is by creating a separate 
> > > map of field names to Tokenizers that works the same way the 
> > > analyzer map does. Is that the best way to do this, or is there a 
> > > better way? For example, would it be better to inherit from 
> > > AnalyzerWrapper instead of from Analyzer? In that case I would 
> > > need to define getWrappedAnalyzer and wrappedComponents functions. 
> > > I think in that case I would still need to put the same kind of 
> > > logic in the wrapComponents function that specifies which 
> > > tokenizer to use with which field, though. It looks like the 
> > > PerFieldAnalyzerWrapper itself assumes that the same tokenizer 
> > > will be used with all fields, as its wrapComponents function 
> > > ignores the fieldname parameter. I would appreciate any help in 
> > > finding out the best way to update this analyzer
> > and to write the required function(s).
> >
> > Thanks,
> > > Mike
> > >
> >
> >
> >
> > --
> > Chris Male | Open Source Search Developer | elasticsearch | www.e< 
> > http://www.dutchworks.nl> lasticsearch.com
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Chris Male | Open Source Search Developer | elasticsearch | www.e< 
> http://www.dutchworks.nl> lasticsearch.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Chris Male | Open Source Search Developer | elasticsearch | www.e<http://www.dutchworks.nl>
lasticsearch.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message