lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin.St...@materna.de
Subject Understanding TooManyClauses-Exception and Query-RAM-size
Date Thu, 08 Jul 2004 17:08:01 GMT

Hi,

a couple of weeks ago we migrated from Lucene 1.2 to 1.4rc3. Everything went
smoothly, but we are experiencing some problems with that new constant limit


	maxClauseCount=1024

which leeds to Exceptions of type 

	org.apache.lucene.search.BooleanQuery$TooManyClauses 

when certain RangeQueries are executed (in fact, we get this Excpetion when
we execute certain Wildcard queries, too). Although we are working with a
fairly small index with about 35.000 documents, we encounter this Exception
when we search for the property "modificationDate". For example

	modificationDate:[000000 TO 0dwc970kw] 

It is my understanding that Lucene iterates over all modificationDate-Fields
inside the index and expands that RangeQuery into some kind of an internal
OR-Query using every modificationDate entry that fits the given range. 

Obviously, we will have many different modificationDates which means that we
will easily hit the limit of 1024 clauses. I know that we can increase the
limit by setting the JVM-Parameter

	-Dorg.apache.lucene.maxClauseCount=100000

or choose an even higher number that exceeds the number of indexed
documents, but then again this limit was introduced for a reason which is to
prevent OutOfMemory-Exceptions in case the internal OR-Query gets too large.


I am wondering how people with millions of indexed documents handle this
situation. Do you take care not to construct those type of "large"
RangeQuery by keeping the query's upper and lower boundaries close together?
Do you keep increasing the maxClauseCount limit as your index grows? 

A couple of days ago Doug Cutting gave a formula which estimates the
RAM-size of a given Query which was 


1 byte * Number of searchable fields in your index * Number of docs in 
your index

plus

1k bytes * number of terms in query

plus

1k bytes * number of phrase terms in query

According to Luke, our index consists of about 500 fields 35.000 documents.
The first line gives 

1 * 500 * 35.000 = 17.500.000 Bytes = 17.5 MB

when we assume that the RangeQuery includes virtually all of our documents
the second line gives

1 k * 35.000 = 35 MB 

we don't expect too many phrase terms so we have a total size of about 52.5
MB per query. 

Once again, our index is pretty small, still we get that pretty big
query-RAM-sizes. If we increase the number of indexed documents by a factor
of 10 we will have 520 MB which in my opinion is getting out of hand. 

Am I missing something? How do other people handle this situation? 


Thanks,

Martin Stein











-- 
Dipl.-Math. Martin Stein
Software Engineer 
CC Portale und Content Management
Business Unit Information
_________________________________________________
MATERNA GmbH Information & Communications
Vosskuhle 37 * 44141 Dortmund * Germany
phone.: +49 231 5599 - 8614 * fax: -8975  
<http://www.materna.de>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message