lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Question for Wildcard Search:
Date Wed, 22 Jun 2005 13:05:24 GMT

On Jun 22, 2005, at 4:01 AM, Morus Walter wrote:

> Markus Atteneder writes:
>
>> There is a possibility for searching with the "*" and "?" wildcard  
>> at the
>> end and in the middle of a search string, but not at the  
>> beginning, is there
>> way to do this?
>>
>>
> Sure. Simply index reversed words.
>
> The reason why QP prohibits wildcards at the beginning is performance.
> If there is some prefix, only terms using this prefix need to be  
> examined,
> if they match the wildcard.
> IIRC you can use wildcards in the beginning if you create the query  
> using
> the api but it will be slow.
>
> So the performant solution is to have an additional field  
> containing the
> tokens in reversed character order.
> Won't help for *foo* though.

There is a technique from the book Managing Gigabytes that I've  
mentioned here before (in February).  Here's a snippet from it:

----
...technique I found in the book Managing Gigabytes, making  
"*string*" queries drastically more efficient for searching (though  
also impacting index size).  Take the term "cat".  It would be  
indexed with all rotated variations with an end of word marker added:

     cat$
     at$c
     t$ca
     $cat

The query for "*at*" would be preprocessed and rotated such that the  
wildcards are collapsed at the end to search for "at*" as a  
PrefixQuery.  A wildcard in the middle of a string like "c*t" would  
become a prefix query for "t$c*".
----

Anyone tried this technique with Lucene?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message