lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: Wildcard query with untokenized punctuation (again)
Date Thu, 14 Jun 2007 13:38:42 GMT
if you don't use the same tokenizer for indexing and searching, you will
have troubles like this.
Mixing exact match (with ") and wildcard (*) is a strange idea.
Typographical rules says that you have a space after a comma, no?
Your field is tokenized?

M.

Renaud Waldura a écrit :
> My very simple analyzer produces tokens made of digits and/or letters only.
> Anything else is discarded. E.g. the input "smith,anna" gets tokenized as 2
> tokens, first "smith" then "anna".
>  
> Say I have indexed documents that contained both "smith,anna" and
> "smith,annanicole". To find them, I enter the query <<smith,ann*>>. The
> stock Lucene 2.0 query parser produces a PrefixQuery for the single token
> "smith,ann". This token doesn't exist in my index, and I don't get a match.
>  
> I have found some references to this:
> http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386.
> html
> but I don't understand how I can fix it. Comma-separated terms like this can
> appear in any field; I don't think I can create an untokenized field.
>  
> Really what I would like in this case is for the comma to be considered
> whitespace, and the query to be parsed to <<+smith +ann*>>. Any way I can
do
> that?
>  
> --Renaud
>  
>  
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message