lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: content disappears in the index
Date Wed, 14 Nov 2012 07:05:41 GMT
Hi Geoff,
cool, that will eliminate possible regex pitfalls in schema.xml

I was thinking about enhancing an existing filter as multi-purpose filter.
E.g. TrimFilter, if maxLength is set then also limit the termAtt to maxLength.
This will keep the number of available filters small, especially for simple tasks.
Any thoughts from the core developers about this idea?

Regards
Bernd


Am 13.11.2012 17:56, schrieb Geoff Cooney:
> Hi,
> 
> I've been following this thread and happen to have a simple
> TruncatingFilter class I wrote for the same purpose.  I think this should
> do what you want:
> 
> 
> 
> import java.io.IOException;
> 
> import org.apache.lucene.analysis.TokenFilter;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
> 
> public class TruncatingFilter extends TokenFilter {
>     private final CharTermAttribute termAtt =
> addAttribute(CharTermAttribute.class);
>     private final int maxLength;
> 
>     protected TruncatingFilter(TokenStream input, int maxLength) {
>         super(input);
>         this.maxLength = maxLength;
>     }
> 
>     @Override
>     public boolean incrementToken() throws IOException {
>         if (input.incrementToken()) {
>             if (termAtt.length() > maxLength) {
>                 termAtt.setLength(maxLength);
>             }
> 
>             return true;
>         } else {
>             return false;
>         }
>     }
> 
> }
> 
> Cheers,
> Geoff
> 
> 
> On Tue, Nov 13, 2012 at 7:54 AM, Erick Erickson <erickerickson@gmail.com>wrote:
> 
>> There's nothing in Solr that I know of that does this. It would be a pretty
>> easy custom filter to create though....
>>
>> FWIW,
>> Erick
>>
>>
>> On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir <rcmuir@gmail.com> wrote:
>>
>>> On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
>>> <bernd.fehling@uni-bielefeld.de> wrote:
>>>> By the way, why does TrimFilter option updateOffset defaults to false,
>>>> just keep it backwards compatible?
>>>>
>>>
>>> In my opinion this option should be removed.
>>>
>>> TokenFilters shouldn't muck with offsets, for a lot of reasons, but
>>> especially because its too late to interact with any charfilter.
>>>
>>> This is the tokenizer's job.
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message