lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hardy Ferentschik" <>
Subject substring indexing to avoid 'TooManyClauses' exception
Date Mon, 12 Nov 2007 21:44:40 GMT

I have a question regarding the way I got around the 'TooManyClauses'  
exception when using wild card queries  

I am using Lucene in conjunction with Hibernate Search  
( I am indexing 'Compmany' objects  
which contain multiple attibutes and the application supports different  
types of searches.

One type of search is a right hand truncated (wildcard query) search of  
the company name. If eg the user searches for 'M' I constructed initially  
a 'M*' query. I have about 250.000 companies in the index. Without any  
modifications I get the 'TooManyClauses' exception and I initially kept  
increasing the 'maxClauseCount'. It works, but performace was terrible. I  
haven't tried working with a filter, but instead decided to try a  
different approach. I index all possible substrings of a string , eg 'Foo'  
would be indexed as 'F', 'Fo' and 'Foo'.

I got rid of the 'TooManyClauses' exception and performace improved by  
magnitude, but I would like to get some feedback from other users whether  
this is a good approach or not.

Of course the index size increased, but that was no issue in this case.  
Are there any potential problems with ranking/scoring?

Thanks for any feedback.


Hartmut Ferentschik
Ekholmsv.339 ,1, 127 45 Skärholmen, Sweden
Phone: +46 855 923 676 (h); +46 704 225 097 (m)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message