lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Wildcard query with untokenized punctuation (again)
Date Thu, 14 Jun 2007 13:43:26 GMT
Gotto agree with Erick here...best idea is just to preprocess the query 
before sending it to the QueryParser.

My first thought is always to get out the sledgehammer...

- Mark

Erick Erickson wrote:
> Well, perhaps the simplest thing would be to pre-process the query and
> make the comma into a whitespace before sending anything to the
> query parser. I don't know how generalizable that sort of solution is in
> your problem space though....
>
> Best
> Erick
>
> On 6/13/07, Renaud Waldura <renaud.waldura@library.ucsf.edu> wrote:
>>
>> My very simple analyzer produces tokens made of digits and/or letters
>> only.
>> Anything else is discarded. E.g. the input "smith,anna" gets 
>> tokenized as
>> 2
>> tokens, first "smith" then "anna".
>>
>> Say I have indexed documents that contained both "smith,anna" and
>> "smith,annanicole". To find them, I enter the query <<smith,ann*>>. The
>> stock Lucene 2.0 query parser produces a PrefixQuery for the single 
>> token
>> "smith,ann". This token doesn't exist in my index, and I don't get a
>> match.
>>
>> I have found some references to this:
>>
>> http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3378386 
>>
>> .
>> html
>> but I don't understand how I can fix it. Comma-separated terms like this
>> can
>> appear in any field; I don't think I can create an untokenized field.
>>
>> Really what I would like in this case is for the comma to be considered
>> whitespace, and the query to be parsed to <<+smith +ann*>>. Any way I

>> can
>> do
>> that?
>>
>> --Renaud
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message