lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Question for Wildcard Search:
Date Wed, 22 Jun 2005 13:05:24 GMT

On Jun 22, 2005, at 4:01 AM, Morus Walter wrote:

> Markus Atteneder writes:
>> There is a possibility for searching with the "*" and "?" wildcard  
>> at the
>> end and in the middle of a search string, but not at the  
>> beginning, is there
>> way to do this?
> Sure. Simply index reversed words.
> The reason why QP prohibits wildcards at the beginning is performance.
> If there is some prefix, only terms using this prefix need to be  
> examined,
> if they match the wildcard.
> IIRC you can use wildcards in the beginning if you create the query  
> using
> the api but it will be slow.
> So the performant solution is to have an additional field  
> containing the
> tokens in reversed character order.
> Won't help for *foo* though.

There is a technique from the book Managing Gigabytes that I've  
mentioned here before (in February).  Here's a snippet from it:

...technique I found in the book Managing Gigabytes, making  
"*string*" queries drastically more efficient for searching (though  
also impacting index size).  Take the term "cat".  It would be  
indexed with all rotated variations with an end of word marker added:


The query for "*at*" would be preprocessed and rotated such that the  
wildcards are collapsed at the end to search for "at*" as a  
PrefixQuery.  A wildcard in the middle of a string like "c*t" would  
become a prefix query for "t$c*".

Anyone tried this technique with Lucene?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message