Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of cooney.geoff@gmail.com
 designates 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAN4YXvcJQyFv9ubt---2q24x3kuneLo7p4q=rRdKPsZKpAVCcg@mail.gmail.com>
References: 
 <5141_1352726359_ZZi1_13aRsjYK.00_50A0F755.6020501@uni-bielefeld.de>
	<CAN4YXvdo5=sJaRo8Qb26cOuN=NmLdC2=tWHzBP=2m5nwA2PBkw@mail.gmail.com>
	<1637_1352728279_ZZi0_53gYSPSm.00_50A0FED7.4030105@uni-bielefeld.de>
	<12072_1352729081_ZZi0_4431DYzl.00_50A101F9.4020300@uni-bielefeld.de>
	<CAN4YXvf6yg9Kgjy-jt99pvheb=w9ZoiT+oBSsLefjavtueR_bA@mail.gmail.com>
	<5141_1352789280_ZZi0_27eHK7mz.00_50A1ED1F.6030703@uni-bielefeld.de>
	<CAOdYfZUVu4iEF4P4t=Z2KXOYyQ-FJ6TByR4uy5mYU8ivf3JJyg@mail.gmail.com>
	<CAN4YXvcJQyFv9ubt---2q24x3kuneLo7p4q=rRdKPsZKpAVCcg@mail.gmail.com>
Date: Tue, 13 Nov 2012 11:56:55 -0500
Message-ID: 
 <CA+1aqq1Rro7+U7EduadDUJxGtK5Op7Wjw7yxKwZ_doLoTfXJ5w@mail.gmail.com>
Subject: Re: content disappears in the index
From: Geoff Cooney <cooney.geoff@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=20cf300fafd72979bc04ce634eec

--20cf300fafd72979bc04ce634eec
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I've been following this thread and happen to have a simple
TruncatingFilter class I wrote for the same purpose.  I think this should
do what you want:


import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class TruncatingFilter extends TokenFilter {
    private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);
    private final int maxLength;

    protected TruncatingFilter(TokenStream input, int maxLength) {
        super(input);
        this.maxLength = maxLength;
    }

    @Override
    public boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            if (termAtt.length() > maxLength) {
                termAtt.setLength(maxLength);
            }

            return true;
        } else {
            return false;
        }
    }

}

Cheers,
Geoff


On Tue, Nov 13, 2012 at 7:54 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> There's nothing in Solr that I know of that does this. It would be a pretty
> easy custom filter to create though....
>
> FWIW,
> Erick
>
>
> On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir <rcmuir@gmail.com> wrote:
>
> > On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
> > <bernd.fehling@uni-bielefeld.de> wrote:
> > > By the way, why does TrimFilter option updateOffset defaults to false,
> > > just keep it backwards compatible?
> > >
> >
> > In my opinion this option should be removed.
> >
> > TokenFilters shouldn't muck with offsets, for a lot of reasons, but
> > especially because its too late to interact with any charfilter.
> >
> > This is the tokenizer's job.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

--20cf300fafd72979bc04ce634eec--