lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Dash Confusion in QueryParser - Bug? Feature?
Date Mon, 20 Oct 2003 16:57:59 GMT
On Wednesday, October 15, 2003, at 10:24  AM, Michael Giles wrote:
> So how do we move this issue forward.  I can't think of a single case 
> where a "-" with no whitespace on either side (i.e. t-shirt, Wal-Mart) 
> should be interpreted as a NOT command.  Is there a feeling that 
> changing the interpretation of such cases is a break in compatibility? 
>  I agree that it will change behavior, but I think that it will change 
> it for the better (i.e. fix it).  The current behavior is really 
> broken (and very frustrating for a user trying to search).

I looked at the patch here:

	http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23838

I'm not entirely satisfied with it.  I'm of the opinion that we should 
only change QueryParser to fix the behavior of operators nestled within 
text with no surrounding whitespace.  The provided patch only works 
with the "-" character, but what about "Wal+Mart"?  Shouldn't we keep 
that together also and hand it to the analyzer?

I'm not convinced at all that we should change the StandardTokenizer to 
not split on dash.  If only QueryParser was fixed and handed "Wal-Mart" 
to the StandardAnalyzer, it would be split the same way as during 
indexing and searches would return the expected hits.

Thoughts?  I'd like to see this fixed, but in a way that makes the most 
general sense.

Thanks,
	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message