lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject per-field Analyzer (was Re: some requests)
Date Sat, 20 Sep 2003 08:13:44 GMT
On Friday, September 19, 2003, at 07:45  PM, Erik Hatcher wrote:
> On Friday, September 19, 2003, at 11:15  AM, hui wrote:
>> 1. Move the Analyzer down to field level from document level so some 
>> fields
>> could be applied a specail analyzer.Other fields still use the default
>> analyzer from the document level.
>> For example, I do not need to index the number for the "content" 
>> field. It
>> helps me reduce the index size a lot when I have some excel files. 
>> But I
>> always need the "created_date" to be indexed though it is a number 
>> field.
>>
>> I know there are some workarounds put in the group, but I think it 
>> should be
>> a good feature to have.
>
> The "workaround" is to write a custom analyzer and and have it do the 
> desired thing per-field.
>
> Hmmm.... just thinking out loud here without knowing if this is 
> possible, but could a generic "wrapper" Analyzer be written that 
> allows other analyzers to be used under the covers based on a field 
> name/analyzer mapping?   If so, that would be quite cool and save 
> folks from having to write custom analyzers as much to handle this 
> pretty typical use-case.  I'll look into this more in the very near 
> future personally, but feel free to have a look at this yourself and 
> see what you can come up with.

What about something like this?

public class PerFieldWrapperAnalyzer extends Analyzer {
   private Analyzer defaultAnalyzer;
   private Map analyzerMap = new HashMap();


   public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) {
     this.defaultAnalyzer = defaultAnalyzer;
   }

   public void addAnalyzer(String fieldName, Analyzer analyzer) {
     analyzerMap.put(fieldName, analyzer);
   }

   public TokenStream tokenStream(String fieldName, Reader reader) {
     Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName);
     if (analyzer == null) {
       analyzer = defaultAnalyzer;
     }

     return analyzer.tokenStream(fieldName, reader);
   }
}

This would allow you to construct a single analyzer out of others, on a 
per-field basis, including a default one for any fields that do not 
have a special one.  Whether the constructor should take the map or the 
addAnalyzer method is implemented is debatable, but I prefer the 
addAnalyzer way.  Maybe addAnalyzer could return 'this' so you could 
chain: new PerFieldWrapperAnalyzer(new 
StandardAnalyzer).addAnalyzer("field1", new 
WhitespaceAnalyzer()).addAnalyzer(.....).  And I'm more inclined to 
call this thing PerFieldAnalyzerWrapper instead.  Any naming 
suggestions?

This simple little class would seem to be the answer to a very common 
question asked.

Thoughts?  Should this be made part of the core?

	Erik


Mime
View raw message