lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noel Lysaght <lysag...@outlook.com>
Subject Re: How to? speed up wildcard queries
Date Thu, 27 Dec 2012 22:04:40 GMT
Your not really looking for a wildcard query. I would think you need to generate an index where
every possible forward word combination is possible. For example take the word "small". You
need to index that as small, sm, sma, smal, ma, mal, mall, al, all etc....
I'm pretty sure there is a sample ananyzer/tikenizer that can do this. You end up with a bigger
index but a lot more power for your searches. 

Cheers
Noel




On 27 Dec 2012, at 19:41, "Floris van Gog" <F.vanGog@xindao.nl> wrote:

> Hello,
> 
> With a few examples taken from blogs (that I do not remember, but if it was yours, thanks!)
I have managed to get lucene.net working for a small search engine webservice to be used behind
a website.. I also added some homemade facetting to it (I guess as solr could have done it,
but not as elaborate). The reason to roll up my own was because of pricing (various pricelists
and brackets) and stocklevel(future stocks) filtering requirements. 
> 
> Even though it works, it is far from optimal (in my eyes), and most of the hurt is in
the wildcard queries. As the searcher will help customers find products, all terms in a searchquery
are automatically pre/post fixed with a *. Not adding the pre/post fixes seriously limits
the use of the free text search part. This is business requirement.
> 
> [The search uses RAMDirectory storage and test below are always performed in sequence,
utilizing a single cpu. Documents are never removed from the index]
> The postfix * is still somewhat ok, as I can do about 800 searches/second on a 1500 document
index. The text in the documents is not that much (a short description, maybe 2-3 lines)
> However, the prefix makes the search throughput drop to about 100 searches/second.  
> If we put this in retrospect, with no wildcards I can get about 4000 searches/second,
and if I only use facets to filter, I can do about 60.000 searches/second. 
> 
> The query used is a manually made boolean query with WildCardQueries within it on 2 fields
in the document using SHOULD.
> 
> Is there a way to speed up prefix * wildcard queries somehow? I am currently thinking
along the lines of adding a field to the document with the text reversed, and only apply a
post-fix wildcard *. Theoretically this should give me about 400 searches/second. 
> 
> Any input is appreciated,
> Floris
> DISCLAIMER:
> The information contained in this communication is confidential and is intended solely

> for the use of the individual or entity to whom it is addressed. If you have received

> it by mistake, please let us know by email reply and delete it from your system. You

> should not copy, disclose or distribute this communication without the authority of 
> Xindao BV. Xindao BV is neither liable for the proper and complete transmission of the

> information contained in this communication nor for any delay in its receipt. Xindao
BV
> does not guarantee that the integrity of this communication has been maintained nor that

> the communication is free of viruses, interceptions or interference.
> 

Mime
View raw message