Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Message-ID: <004701c32b32$23e3d9c0$c300000a@caliente>
From: "Eric Jain" <Eric.Jain@isb-sib.ch>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
References: <GCECJJGGHMPDKKNIJCNOIEPJCAAA.lixin@fulldegree.com>
 <3EDE8038.6020309@lucene.com>
Subject: Re: search item with '-' in it
Date: Thu, 5 Jun 2003 09:14:44 +0200
Organization: Swiss Institute of Bioinformatics
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

> If we change StandardTokenizer in this way then we risk breaking all
> the applications that currently use it and depend on its current
> behaviour.

My personal issue with the StandardTokenizer is that it splits off
single letter prefixes, as in 't-shirt'. A query for 't-shirt' therefore
also returns documents with 't. miller's shirt'. I can't imagine how
this behavior could ever be considered useful or depended upon, but I
may be wrong (perhaps someone has an example where it does make sense).

--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org