lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floris van Gog <>
Subject How to? speed up wildcard queries
Date Thu, 27 Dec 2012 19:40:42 GMT
With a few examples taken from blogs (that I do not remember, but if it was yours, thanks!)
I have managed to get working for a small search engine webservice to be used behind
a website.. I also added some homemade facetting to it (I guess as solr could have done it,
but not as elaborate). The reason to roll up my own was because of pricing (various pricelists
and brackets) and stocklevel(future stocks) filtering requirements. 
Even though it works, it is far from optimal (in my eyes), and most of the hurt is in the
wildcard queries. As the searcher will help customers find products, all terms in a searchquery
are automatically pre/post fixed with a *. Not adding the pre/post fixes seriously limits
the use of the free text search part. This is business requirement.
[The search uses RAMDirectory storage and test below are always performed in sequence, utilizing
a single cpu. Documents are never removed from the index]
The postfix * is still somewhat ok, as I can do about 800 searches/second on a 1500 document
index. The text in the documents is not that much (a short description, maybe 2-3 lines)
However, the prefix makes the search throughput drop to about 100 searches/second.  
If we put this in retrospect, with no wildcards I can get about 4000 searches/second, and
if I only use facets to filter, I can do about 60.000 searches/second. 
The query used is a manually made boolean query with WildCardQueries within it on 2 fields
in the document using SHOULD.
Is there a way to speed up prefix * wildcard queries somehow? I am currently thinking along
the lines of adding a field to the document with the text reversed, and only apply a post-fix
wildcard *. Theoretically this should give me about 400 searches/second. 
Any input is appreciated,
The information contained in this communication is confidential and is intended solely 
for the use of the individual or entity to whom it is addressed. If you have received 
it by mistake, please let us know by email reply and delete it from your system. You 
should not copy, disclose or distribute this communication without the authority of 
Xindao BV. Xindao BV is neither liable for the proper and complete transmission of the 
information contained in this communication nor for any delay in its receipt. Xindao BV
does not guarantee that the integrity of this communication has been maintained nor that 
the communication is free of viruses, interceptions or interference.

View raw message