lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hardy Ferentschik" <ha...@ferentschik.de>
Subject substring indexing to avoid 'TooManyClauses' exception
Date Mon, 12 Nov 2007 21:44:40 GMT
Hi,

I have a question regarding the way I got around the 'TooManyClauses'  
exception when using wild card queries  
(http://wiki.apache.org/lucene-java/LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831).


I am using Lucene in conjunction with Hibernate Search  
(http://www.hibernate.org/410.html). I am indexing 'Compmany' objects  
which contain multiple attibutes and the application supports different  
types of searches.

One type of search is a right hand truncated (wildcard query) search of  
the company name. If eg the user searches for 'M' I constructed initially  
a 'M*' query. I have about 250.000 companies in the index. Without any  
modifications I get the 'TooManyClauses' exception and I initially kept  
increasing the 'maxClauseCount'. It works, but performace was terrible. I  
haven't tried working with a filter, but instead decided to try a  
different approach. I index all possible substrings of a string , eg 'Foo'  
would be indexed as 'F', 'Fo' and 'Foo'.

I got rid of the 'TooManyClauses' exception and performace improved by  
magnitude, but I would like to get some feedback from other users whether  
this is a good approach or not.

Of course the index size increased, but that was no issue in this case.  
Are there any potential problems with ranking/scoring?

Thanks for any feedback.

--Hardy


-- 
Hartmut Ferentschik
Ekholmsv.339 ,1, 127 45 Skärholmen, Sweden
Phone: +46 855 923 676 (h); +46 704 225 097 (m)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message