Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 33153 invoked from network); 21 Apr 2010 13:50:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Apr 2010 13:50:50 -0000 Received: (qmail 44752 invoked by uid 500); 21 Apr 2010 13:50:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 44700 invoked by uid 500); 21 Apr 2010 13:50:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 44692 invoked by uid 99); 21 Apr 2010 13:50:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Apr 2010 13:50:47 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.190.49.18] (HELO web52908.mail.re2.yahoo.com) (206.190.49.18) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 21 Apr 2010 13:50:37 +0000 Received: (qmail 36981 invoked by uid 60001); 21 Apr 2010 13:50:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1271857815; bh=+XFbLZkv1yxQWnx7SneA2Xg1XlOdcj4Adxed6ZG35nw=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=aIrDTRqVnHj2OnWixvzT5ChrHvGf8hhLr5mHD400SapV0949fhqHJ/jkd2sTAUEPpnsT3I7Yr/tUqQAIDkvoosfuY5wbzFyp6AWiWrAvVvAuuZi29pr4epjSAlDsYK/ivTD4uZtb61MAEnXs7Ppk6BU1kFxje4VnDxr69bIF1wk= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=aL9dby+PPfYCIYMAxF4CiYX+g3H2ByYyHJkI5+z2L+kZ38fjmPGmXiWbpnqaX2bQNATHDYyLkaGrwevWZcTQIlbtV+ghOr4mfcvDzw+svKKN1Z6CDWZybasX1ob1cj+Ph56akA8xeSuqXQDRRwBaxFxz4SS5d43zrqIbqQJNypE=; Message-ID: <470047.33619.qm@web52908.mail.re2.yahoo.com> X-YMail-OSG: p.zRBMwVM1mb9D6Cf05C1ba1kRrW5YusY0aprQUjZZ_3Cbt WUm2w48.4_C_ywwGJOAS4AJVwUS_iKVbJLbmN1clSmOinUfq9Y8JyslbLKCt 3K2DZpOE6d0kdP58Uvs1DGLBbtlxm4HJ8BYXS0NQ6U4Zh_slClQXJpgV3zK8 32j0xlroU4y7zKYc_wRHrzXAeb6LtpnkHA30AD6woGsmHy0U3mzmZh5WsCm4 0rI3fYsNx0AVpzr1LeH9_YPcjN0M9mVA44hNSGPV.hG_3mLyYOpztLbGhzlB Gyud7a9U5FKfCdxV2qe.MRVMyNLVoGDAPY6SGJ3TuZXKdIpwqfQif2w-- Received: from [193.140.184.100] by web52908.mail.re2.yahoo.com via HTTP; Wed, 21 Apr 2010 06:50:15 PDT X-Mailer: YahooMailClassic/10.1.9 YahooMailWebService/0.8.102.267879 Date: Wed, 21 Apr 2010 06:50:15 -0700 (PDT) From: Ahmet Arslan Subject: Re: are long words split into up to 256 long tokens? To: java-user@lucene.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org > Is 256 some inner maximum too > in some > lucene internal that causes this? What is happening is that > the long > word is split into smaller words up to 256 and then the min > and max > limit applied. Is that correct? I have removed LengthFilter > and still > see the splitting at 256 happen. I would like not to have > this, and > removed altogheter any word longer than max, wihtout > decomposing into > smaller ones. Is there a way to achieve this? > > Using lucene 3.0.1 Assuming your Tokenizer extends CharTokenizer: CharTokenizer.java has this field: private static final int MAX_WORD_LEN = 255; you can modify CharTokenizer.java according to your needs. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org