lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wennström <rob...@agent25.se>
Subject RE: OutOfMemoryError with boolean queries
Date Wed, 19 Mar 2003 18:14:06 GMT
Sorry. I wasn't verbose enough.

I use the default memory settings. But my issue was the core structure of Lucene
taking up (it seems to me) more memory than it would have to, if it had a
different approach.
Correct me if I'm wrong, but it seems to me that BooleanQuery stores all hits
(as Bucket objects) from all terms in the query even if it is a simple  war* AND
wash* AND sad*. Instead of looking for wash* just in the war* hits (and then
looking for sad* in the remaining hits) it makes three separate searches, which
would be a waste of memory.

----- test output begins -----

Index size = 55000
Query: a*
Total memory before: 2031616
Searching for: a* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
53527 total matching documents (1984ms)
Query: e*
Total memory before: 55128064
Searching for: e* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
52456 total matching documents (984ms)
Query: a* AND e*
Total memory before: 55128064
Searching for: +a* +e* (org.apache.lucene.search.BooleanQuery)
Total memory after: 124882944
51267 total matching documents (2468ms)

----- test output ends -----

In my perfect world the memory allocation, when searching for  a* AND e*, should
not increase at all after the both separate searches  a*  and  e*, cause it
would just allocate space for a*-hits, and ignoring e*-hits that has no previous
hit.


My biggest index lies at 2,34 million documents during testing, but should grow
with approximately 10000 docs/day in production.
With that figure I wish for the best possible memory handling.


At the moment we use a search engine that, given the right question (or wrong),
consumes memory like a starving wolf and crashes the whole thing. The search
engine should be able to play with about 1GB RAM on the machine.
I just don't want the same possibilities of a crash with Lucene too.


I want to know if the Lucene developers feel that there are things to optimize
or if they have done everything like it should be from the start ?


thanks
/RW


> -----Ursprungligt meddelande-----
> Från: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Skickat: den 19 mars 2003 16:19
> Till: lucene-user@jakarta.apache.org
> Ämne: Re: OutOfMemoryError with boolean queries
> 
> 
> Robert,
> 
> I'm moving this to lucene-user, which is a more appropriate list for
> this type of a problem.
> You are not saying whether you are using some of those handy -X (-Xms
> -Xmx) command line switches when you invoke your application that dies
> with OutOfMemoryError.
> If you are not, try that, it may help.  I recall a few other people
> reporting the same problem and using -Xms and -Xmx solved their
> problem.
> 
> If your machine doesn't have the RAM it needs this won't help, of
> course :)
> 
> Otis
> 
> 
> --- Robert_Wennström <robert@agent25.se> wrote:
> > Hi,
> > 
> > I'm experiencing OutOfMemoryErrors when searching using many logical
> > ANDs
> > combined with prefix queries.
> > The reason is clearly too many returned hits waiting for combined
> > evaluation.
> > 
> > I was wondering if there are any thoughts of changing the search
> > approach to
> > something less memory consuming.
> > This is quite a big problem even with small sets of documents.
> > 
> > Example:
> > my 55000 document index runs out of memory when searching 
> for  a* AND
> > e*
> > 
> > Could you estimate the difficulty to change the behaviour to search
> > for e* in
> > just the hits matching a* ?
> > I'm about to put a BitSet at the innermost hit collection 
> to sort out
> > AND-clauses that hasn't been matched by previous AND-clauses.
> > Is there a better approach ?
> > 
> > 
> > Thanks for a great java project guys
> > 
> > ____________________________________________
> > Robert Wennström [developer, netadmin]
> > robert -at- agent25.com
> > www.agent25.com
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> http://platinum.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message