Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B57151199A for ; Fri, 15 Aug 2014 11:49:34 +0000 (UTC) Received: (qmail 90801 invoked by uid 500); 15 Aug 2014 11:49:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 90758 invoked by uid 500); 15 Aug 2014 11:49:22 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90668 invoked by uid 99); 15 Aug 2014 11:49:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Aug 2014 11:49:22 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of SRS0=j7Pwxq=5J=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.37 as permitted sender) Received: from [65.254.253.37] (HELO walmailout04.yourhostingaccount.com) (65.254.253.37) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Aug 2014 11:48:54 +0000 Received: from mailscan13.yourhostingaccount.com ([10.1.15.13] helo=walmailscan13.yourhostingaccount.com) by walmailout04.yourhostingaccount.com with esmtp (Exim) id 1XIG05-0007b9-UR for java-user@lucene.apache.org; Fri, 15 Aug 2014 07:48:53 -0400 Received: from [10.114.3.32] (helo=walimpout12) by walmailscan13.yourhostingaccount.com with esmtp (Exim) id 1XIG05-0008Np-Pc for java-user@lucene.apache.org; Fri, 15 Aug 2014 07:48:53 -0400 Received: from walauthsmtp11.yourhostingaccount.com ([10.1.18.11]) by walimpout12 with id ezoq1o0030EKrUA01zotUB; Fri, 15 Aug 2014 07:48:53 -0400 X-Authority-Analysis: v=2.1 cv=eOGdjRZ1 c=1 sm=1 tr=0 a=5bnIr+R+vs56oWgm0tidcA==:117 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=pq4jwCggAAAA:8 a=OF-CdTOGAAAA:8 a=aQzbgH187woA:10 a=3jZET7lWBKwA:10 a=IkcTkHD0fZMA:10 a=jvYhGVW7AAAA:8 a=OA2lqS22AAAA:8 a=mV9VRH-2AAAA:8 a=L4FyBIRl35vjtlIu31AA:9 a=13H-_Pj0nZe0c79g:21 a=69Jobqlf4BRvOCtn:21 a=QEXdDO2ut3YA:10 Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:60773 helo=JackKrupansky14) by walauthsmtp11.yourhostingaccount.com with esmtpa (Exim) id 1XIG02-0005pR-AW for java-user@lucene.apache.org; Fri, 15 Aug 2014 07:48:50 -0400 Message-ID: <8CEBD9011EF94C3EAA9E352C562663FB@JackKrupansky14> From: "Jack Krupansky" To: References: In-Reply-To: Subject: Re: WhiteSpaceTokenizer Date: Fri, 15 Aug 2014 07:48:53 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org Yeah, it should be documented better, and configurable. Some discussion of related issues here: https://issues.apache.org/jira/browse/LUCENE-1118 https://issues.apache.org/jira/browse/SOLR-4148 I actually filed a Jira for this already. No action so far, but PLEASE feel free to comment on it: https://issues.apache.org/jira/browse/LUCENE-5785 -- Jack Krupansky -----Original Message----- From: Sheng Sent: Thursday, August 14, 2014 11:38 PM To: java-user@lucene.apache.org Subject: WhiteSpaceTokenizer The length of token has to be shorter than 255, otherwise there will be unpredictable behaviors for this tokenizer. I see 255 is set as a private final in the src code, but there is no documentation to explicitly address that. Can we either make that number configurable (if not an option, I'd like to know why), or put some notes to its java doc? I had a hard time to figure that out... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org