Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D267ADBF6 for ; Tue, 13 Nov 2012 16:57:27 +0000 (UTC) Received: (qmail 93297 invoked by uid 500); 13 Nov 2012 16:57:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93057 invoked by uid 500); 13 Nov 2012 16:57:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 92990 invoked by uid 99); 13 Nov 2012 16:57:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 16:57:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cooney.geoff@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 16:57:16 +0000 Received: by mail-qa0-f48.google.com with SMTP id s11so2775688qaa.14 for ; Tue, 13 Nov 2012 08:56:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1ZoX4LwCvRowYCAYfSmh95W6wi1lLsQbu1QMs3WnK7c=; b=O2bwLqGC76BMIB1KIFvb6RVgpoqMI9y6x3xS+fRSLzbKDyOL46AZI17JRz0Ljt7ffT jB8fPcjErqzP10RgBbSv6npuW8z8homrx94k5CEVlIzPfi1t1ghB+E0jhEHJzZccn5kk +BctOWsWjYJq0UGO6mjWrn/wu5AqGTw46Iz/dVoyjoAHAEZ+adGPI8x87HnwoQrJTaTf tzMgeLxpSOMsvaE3XRDydPhnVd7Hthw07rVE7HW36N0Z1c0NjY3qr1T7TQkbZqVNhMJr pNqRerHkuZ5+p/hxTSQ5XOCItYGb+RjozcaNcMYYE9P/dLMCeywv6FYAEa0KG/QVqLt9 zT8A== MIME-Version: 1.0 Received: by 10.224.189.196 with SMTP id df4mr631008qab.16.1352825815267; Tue, 13 Nov 2012 08:56:55 -0800 (PST) Received: by 10.49.27.1 with HTTP; Tue, 13 Nov 2012 08:56:55 -0800 (PST) In-Reply-To: References: <5141_1352726359_ZZi1_13aRsjYK.00_50A0F755.6020501@uni-bielefeld.de> <1637_1352728279_ZZi0_53gYSPSm.00_50A0FED7.4030105@uni-bielefeld.de> <12072_1352729081_ZZi0_4431DYzl.00_50A101F9.4020300@uni-bielefeld.de> <5141_1352789280_ZZi0_27eHK7mz.00_50A1ED1F.6030703@uni-bielefeld.de> Date: Tue, 13 Nov 2012 11:56:55 -0500 Message-ID: Subject: Re: content disappears in the index From: Geoff Cooney To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf300fafd72979bc04ce634eec X-Virus-Checked: Checked by ClamAV on apache.org --20cf300fafd72979bc04ce634eec Content-Type: text/plain; charset=ISO-8859-1 Hi, I've been following this thread and happen to have a simple TruncatingFilter class I wrote for the same purpose. I think this should do what you want: import java.io.IOException; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; public class TruncatingFilter extends TokenFilter { private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); private final int maxLength; protected TruncatingFilter(TokenStream input, int maxLength) { super(input); this.maxLength = maxLength; } @Override public boolean incrementToken() throws IOException { if (input.incrementToken()) { if (termAtt.length() > maxLength) { termAtt.setLength(maxLength); } return true; } else { return false; } } } Cheers, Geoff On Tue, Nov 13, 2012 at 7:54 AM, Erick Erickson wrote: > There's nothing in Solr that I know of that does this. It would be a pretty > easy custom filter to create though.... > > FWIW, > Erick > > > On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote: > > > On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling > > wrote: > > > By the way, why does TrimFilter option updateOffset defaults to false, > > > just keep it backwards compatible? > > > > > > > In my opinion this option should be removed. > > > > TokenFilters shouldn't muck with offsets, for a lot of reasons, but > > especially because its too late to interact with any charfilter. > > > > This is the tokenizer's job. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --20cf300fafd72979bc04ce634eec--