lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Waldura" <renaud.wald...@library.ucsf.edu>
Subject Wildcard query with untokenized punctuation (again)
Date Wed, 13 Jun 2007 23:13:02 GMT
My very simple analyzer produces tokens made of digits and/or letters only.
Anything else is discarded. E.g. the input "smith,anna" gets tokenized as 2
tokens, first "smith" then "anna".
 
Say I have indexed documents that contained both "smith,anna" and
"smith,annanicole". To find them, I enter the query <<smith,ann*>>. The
stock Lucene 2.0 query parser produces a PrefixQuery for the single token
"smith,ann". This token doesn't exist in my index, and I don't get a match.
 
I have found some references to this:
http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386.
html
but I don't understand how I can fix it. Comma-separated terms like this can
appear in any field; I don't think I can create an untokenized field.
 
Really what I would like in this case is for the comma to be considered
whitespace, and the query to be parsed to <<+smith +ann*>>. Any way I can do
that?
 
--Renaud
 
 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message