lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoff Cooney <cooney.ge...@gmail.com>
Subject Re: content disappears in the index
Date Tue, 13 Nov 2012 16:56:55 GMT
Hi,

I've been following this thread and happen to have a simple
TruncatingFilter class I wrote for the same purpose.  I think this should
do what you want:



import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class TruncatingFilter extends TokenFilter {
    private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);
    private final int maxLength;

    protected TruncatingFilter(TokenStream input, int maxLength) {
        super(input);
        this.maxLength = maxLength;
    }

    @Override
    public boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if (termAtt.length() > maxLength) {
                termAtt.setLength(maxLength);
            }

            return true;
        } else {
            return false;
        }
    }

}

Cheers,
Geoff


On Tue, Nov 13, 2012 at 7:54 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> There's nothing in Solr that I know of that does this. It would be a pretty
> easy custom filter to create though....
>
> FWIW,
> Erick
>
>
> On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir <rcmuir@gmail.com> wrote:
>
> > On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
> > <bernd.fehling@uni-bielefeld.de> wrote:
> > > By the way, why does TrimFilter option updateOffset defaults to false,
> > > just keep it backwards compatible?
> > >
> >
> > In my opinion this option should be removed.
> >
> > TokenFilters shouldn't muck with offsets, for a lot of reasons, but
> > especially because its too late to interact with any charfilter.
> >
> > This is the tokenizer's job.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message