Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 16659 invoked from network); 5 Jun 2003 07:14:38 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 5 Jun 2003 07:14:38 -0000 Received: (qmail 3904 invoked by uid 97); 5 Jun 2003 07:17:04 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 3897 invoked from network); 5 Jun 2003 07:17:03 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 5 Jun 2003 07:17:03 -0000 Received: (qmail 16399 invoked by uid 500); 5 Jun 2003 07:14:35 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 16386 invoked from network); 5 Jun 2003 07:14:34 -0000 Received: from expasy-f.unige.ch (HELO expasy-ng.isb-sib.ch) (192.33.215.142) by daedalus.apache.org with SMTP; 5 Jun 2003 07:14:34 -0000 Received: from caliente (router.isb-sib.ch [192.33.215.254]) (authenticated bits=0) by expasy-ng.isb-sib.ch (8.12.8p1/8.12.3) with ESMTP id h557EeHc007921 for ; Thu, 5 Jun 2003 09:14:40 +0200 Message-ID: <004701c32b32$23e3d9c0$c300000a@caliente> From: "Eric Jain" To: "Lucene Users List" References: <3EDE8038.6020309@lucene.com> Subject: Re: search item with '-' in it Date: Thu, 5 Jun 2003 09:14:44 +0200 Organization: Swiss Institute of Bioinformatics MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-MailScanner: Found to be clean X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > If we change StandardTokenizer in this way then we risk breaking all > the applications that currently use it and depend on its current > behaviour. My personal issue with the StandardTokenizer is that it splits off single letter prefixes, as in 't-shirt'. A query for 't-shirt' therefore also returns documents with 't. miller's shirt'. I can't imagine how this behavior could ever be considered useful or depended upon, but I may be wrong (perhaps someone has an example where it does make sense). -- Eric Jain --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org