lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: jaspq: dashed numerical values tokenized differently
Date Wed, 03 Nov 2004 12:38:32 GMT

On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote:
>> Query parser was changed to treat '-' within words as part of the 
>> word.
>> Before that change a query 'dash-test' was parsed as 'dash AND NOT
> test'.
>> Now QP reads one word 'dash-test' which is analyzed. If the analyzer
>> splits that to more than one token (standard analyzer does) a phrase
>> query is created.
>> The difference you see comes from standard analyzer which tokenizes
>> dash-test dash-123 to tokens dash, test and dash-123.
>> Prefix queries aren't analyzed.
>
> So you say that dash-123 is a prefix query whereas dash-test is not?
> I found also (with Luke) that dash-anystring123 is not tokenized as
> well.
> What exactly are the criteria for Lucene to decide what a prefix is and
> what not?

Anything that ends with an asterisk is parsed as a PrefixQuery, as long 
as it does not have other wildcard characters.  If it has other 
wildcard characters or the asterisk is not at the end, then it is 
parsed as a WildcardQuery.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message