lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hui" <...@triplehop.com>
Subject Re: per-field Analyzer (was Re: some requests)
Date Mon, 22 Sep 2003 13:40:05 GMT
Good work, Erik.

Hui

----- Original Message ----- 
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Saturday, September 20, 2003 4:13 AM
Subject: per-field Analyzer (was Re: some requests)


> On Friday, September 19, 2003, at 07:45  PM, Erik Hatcher wrote:
> > On Friday, September 19, 2003, at 11:15  AM, hui wrote:
> >> 1. Move the Analyzer down to field level from document level so some 
> >> fields
> >> could be applied a specail analyzer.Other fields still use the default
> >> analyzer from the document level.
> >> For example, I do not need to index the number for the "content" 
> >> field. It
> >> helps me reduce the index size a lot when I have some excel files. 
> >> But I
> >> always need the "created_date" to be indexed though it is a number 
> >> field.
> >>
> >> I know there are some workarounds put in the group, but I think it 
> >> should be
> >> a good feature to have.
> >
> > The "workaround" is to write a custom analyzer and and have it do the 
> > desired thing per-field.
> >
> > Hmmm.... just thinking out loud here without knowing if this is 
> > possible, but could a generic "wrapper" Analyzer be written that 
> > allows other analyzers to be used under the covers based on a field 
> > name/analyzer mapping?   If so, that would be quite cool and save 
> > folks from having to write custom analyzers as much to handle this 
> > pretty typical use-case.  I'll look into this more in the very near 
> > future personally, but feel free to have a look at this yourself and 
> > see what you can come up with.
> 
> What about something like this?
> 
> public class PerFieldWrapperAnalyzer extends Analyzer {
>    private Analyzer defaultAnalyzer;
>    private Map analyzerMap = new HashMap();
> 
> 
>    public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) {
>      this.defaultAnalyzer = defaultAnalyzer;
>    }
> 
>    public void addAnalyzer(String fieldName, Analyzer analyzer) {
>      analyzerMap.put(fieldName, analyzer);
>    }
> 
>    public TokenStream tokenStream(String fieldName, Reader reader) {
>      Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName);
>      if (analyzer == null) {
>        analyzer = defaultAnalyzer;
>      }
> 
>      return analyzer.tokenStream(fieldName, reader);
>    }
> }
> 
> This would allow you to construct a single analyzer out of others, on a 
> per-field basis, including a default one for any fields that do not 
> have a special one.  Whether the constructor should take the map or the 
> addAnalyzer method is implemented is debatable, but I prefer the 
> addAnalyzer way.  Maybe addAnalyzer could return 'this' so you could 
> chain: new PerFieldWrapperAnalyzer(new 
> StandardAnalyzer).addAnalyzer("field1", new 
> WhitespaceAnalyzer()).addAnalyzer(.....).  And I'm more inclined to 
> call this thing PerFieldAnalyzerWrapper instead.  Any naming 
> suggestions?
> 
> This simple little class would seem to be the answer to a very common 
> question asked.
> 
> Thoughts?  Should this be made part of the core?
> 
> Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

Mime
View raw message