lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Question for Wildcard Search:
Date Wed, 22 Jun 2005 14:48:55 GMT
>Markus Atteneder writes:
>>  There is a possibility for searching with the "*" and "?" wildcard at the
>>  end and in the middle of a search string, but not at the beginning, is there
>>  way to do this?
>>
>Sure. Simply index reversed words.
>
>The reason why QP prohibits wildcards at the beginning is performance.
>If there is some prefix, only terms using this prefix need to be examined,
>if they match the wildcard.
>IIRC you can use wildcards in the beginning if you create the query using
>the api but it will be slow.
>
>So the performant solution is to have an additional field containing the
>tokens in reversed character order.
>Won't help for *foo* though.

You can also index ngrams - say 3-grams. Every word gets tokenized & 
indexed as a sequence of three letter sub-strings. E.g. "tokenized" 
would be indexed as "tok" "oke" "ken" "eni" "niz" "ize" "zed".

That would help you find *foo*, but not *ha*.

-- Ken
-- 
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message