Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 61682 invoked from network); 23 Nov 2005 10:12:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 23 Nov 2005 10:12:55 -0000 Received: (qmail 9711 invoked by uid 500); 23 Nov 2005 10:12:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 9686 invoked by uid 500); 23 Nov 2005 10:12:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 9675 invoked by uid 99); 23 Nov 2005 10:12:48 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Nov 2005 02:12:48 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Nov 2005 02:14:19 -0800 Received: by ehatchersolutions.com (Postfix, from userid 504) id 7CF6713E2005; Wed, 23 Nov 2005 05:12:22 -0500 (EST) Received: from [172.16.1.101] (va-71-51-13-140.dhcp.sprint-hsd.net [71.51.13.140]) by ehatchersolutions.com (Postfix) with ESMTP id 1DC8E13E2006 for ; Wed, 23 Nov 2005 05:11:55 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v746.2) In-Reply-To: <1132617244.361691355.13174.sendItem@bloglines.com> References: <1132617244.361691355.13174.sendItem@bloglines.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Strange tokenization with StandardFilter Date: Wed, 23 Nov 2005 05:12:15 -0500 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.746.2) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-5.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On 21 Nov 2005, at 18:54, yahootintin.11533894@bloglines.com wrote: > I'm using a StandardFilter and seeing some strange tokenization. > > Here's > the input: > apache.org hosts lucene at apache.org. > > Here's the tokens it > outputs: > apache.org > hosts > lucene > at > apacheorg > > Is this a bug > that apache.org and apache.org. don't convert to the same token? Didn't you just report this same issue? The behavior certainly is not sensible in this case. So I'd call it a bug, yes. Again, the trailing '.' is the culprit. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org