lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: BooleanQuery$TooManyClauses
Date Mon, 11 Jul 2005 19:18:53 GMT
2500 vs 84. Wow. That's quite a few OR statements I would be saving 
following your guide of just indexing the parts of the datetime I plan 
to search on. Every ms count.

Now I have a clear picture of how range query works. Great stuff. Thanks.

Btw, coming from a db background I'm so used to writing queries in the 
fashion where I put the most distinct comparison statement, the one 
likely to return the least number of rows, first, in the where 
statement. Db can still be pretty dumb with bad statistics and choose 
the wrong execution plan so I like optimize for them when all possible 
and force the issue.

If I have a sample lucene query:

"+a:abc +b:cde +d:bbd +date:[2001 TO 2005] -e:noway"

Does Lucene's execution engine try to figure out via statistics, 
guesstimate, which path to take first? Or does it just go brute force 
and follow the execution plan from left to right? Or does it just do all 
of them individually, not executing the next search on the results of 
the prior, and then ORing them at the end?


Erik Hatcher wrote:
> On Jul 11, 2005, at 1:45 AM, wrote:
>> Did a google serach on the problem when using the range search  phrase 
>> of  "+datefield:[199801 TO 200512]" (date stored as  "YYYYMMDD") which 
>> returns 1 million hits.
>> error:$TooManyClauses
>> Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option  
>> allowed the search query to run without error. The actual value  
>> needed is between 2300 and 2400. At 2300 the query fails.
>> My question is how does Lucene perform range query?  As a bunch of  
>> smaller boolean queries? How does one estimate the number of  clauses 
>> required for a general query and more specifically on a  range query?
> RangeQuery expands under the covers to a BooleanQuery with all  matching 
> terms OR'd together.
> In your case, if you've indexed a term for every day in that range  
> using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31  
> (minus 31 because you'd omit December '05 since you are only going to  
> 200512).  If all you need is YYYYMM range searching, then index it as  
> that (that'd be 7 years * 12 months/year = 84 terms).
>     Erik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message