lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Fwd: per-field Analyzer (was Re: some requests)
Date Mon, 22 Sep 2003 18:16:18 GMT
Any objections or commentary about adding this to Lucene's core?

	Erik


Begin forwarded message:

> From: "hui" <hui@triplehop.com>
> Date: Mon Sep 22, 2003  9:40:05  AM US/Eastern
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Subject: Re: per-field Analyzer (was Re: some requests)
> Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>
> Good work, Erik.
>
> Hui
>
> ----- Original Message -----
> From: "Erik Hatcher" <erik@ehatchersolutions.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Saturday, September 20, 2003 4:13 AM
> Subject: per-field Analyzer (was Re: some requests)
>
>
>> On Friday, September 19, 2003, at 07:45  PM, Erik Hatcher wrote:
>>> On Friday, September 19, 2003, at 11:15  AM, hui wrote:
>>>> 1. Move the Analyzer down to field level from document level so some
>>>> fields
>>>> could be applied a specail analyzer.Other fields still use the 
>>>> default
>>>> analyzer from the document level.
>>>> For example, I do not need to index the number for the "content"
>>>> field. It
>>>> helps me reduce the index size a lot when I have some excel files.
>>>> But I
>>>> always need the "created_date" to be indexed though it is a number
>>>> field.
>>>>
>>>> I know there are some workarounds put in the group, but I think it
>>>> should be
>>>> a good feature to have.
>>>
>>> The "workaround" is to write a custom analyzer and and have it do the
>>> desired thing per-field.
>>>
>>> Hmmm.... just thinking out loud here without knowing if this is
>>> possible, but could a generic "wrapper" Analyzer be written that
>>> allows other analyzers to be used under the covers based on a field
>>> name/analyzer mapping?   If so, that would be quite cool and save
>>> folks from having to write custom analyzers as much to handle this
>>> pretty typical use-case.  I'll look into this more in the very near
>>> future personally, but feel free to have a look at this yourself and
>>> see what you can come up with.
>>
>> What about something like this?
>>
>> public class PerFieldWrapperAnalyzer extends Analyzer {
>>    private Analyzer defaultAnalyzer;
>>    private Map analyzerMap = new HashMap();
>>
>>
>>    public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) {
>>      this.defaultAnalyzer = defaultAnalyzer;
>>    }
>>
>>    public void addAnalyzer(String fieldName, Analyzer analyzer) {
>>      analyzerMap.put(fieldName, analyzer);
>>    }
>>
>>    public TokenStream tokenStream(String fieldName, Reader reader) {
>>      Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName);
>>      if (analyzer == null) {
>>        analyzer = defaultAnalyzer;
>>      }
>>
>>      return analyzer.tokenStream(fieldName, reader);
>>    }
>> }
>>
>> This would allow you to construct a single analyzer out of others, on 
>> a
>> per-field basis, including a default one for any fields that do not
>> have a special one.  Whether the constructor should take the map or 
>> the
>> addAnalyzer method is implemented is debatable, but I prefer the
>> addAnalyzer way.  Maybe addAnalyzer could return 'this' so you could
>> chain: new PerFieldWrapperAnalyzer(new
>> StandardAnalyzer).addAnalyzer("field1", new
>> WhitespaceAnalyzer()).addAnalyzer(.....).  And I'm more inclined to
>> call this thing PerFieldAnalyzerWrapper instead.  Any naming
>> suggestions?
>>
>> This simple little class would seem to be the answer to a very common
>> question asked.
>>
>> Thoughts?  Should this be made part of the core?
>>
>> Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message