lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucinda_Rockem...@vwr.com
Subject OutOfMemoryError
Date Fri, 02 May 2003 21:34:23 GMT
Hello:

I'm using lucene (lucene-1.3-dev1) and tomcat (4.1) for my company's
websites.

The largest number of documents contained in any index is approx. 800,000.
I'm getting OutOfMemoryErrors sporadically.  Some are definitely related to
wildcarded queries.  The memory assigned to my jvm(s)= -Xms768m -Xmx768m.

I found lucene-user group's msgNo 3956 which seems to sum up what I'm
seeing (portion pasted below).

from msgNo=3956:

From: Robert Wennstr?m <robert@agent25.se>
Subject: OutOfMemoryError with boolean queries
Date: Wed, 19 Mar 2003 19:14:06 +0100
Content-Type: text/plain;
      charset="windows-1252"


Sorry. I wasn't verbose enough.

I use the default memory settings. But my issue was the core structure of
Lucene
taking up (it seems to me) more memory than it would have to, if it had a
different approach.
Correct me if I'm wrong, but it seems to me that BooleanQuery stores all
hits
(as Bucket objects) from all terms in the query even if it is a simple
war* AND
wash* AND sad*. Instead of looking for wash* just in the war* hits (and
then
looking for sad* in the remaining hits) it makes three separate searches,
which
would be a waste of memory.

----- test output begins -----

Index size = 55000
Query: a*
Total memory before: 2031616
Searching for: a* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
53527 total matching documents (1984ms)
Query: e*
Total memory before: 55128064
Searching for: e* (org.apache.lucene.search.PrefixQuery)
Total memory after: 55128064
52456 total matching documents (984ms)
Query: a* AND e*
Total memory before: 55128064
Searching for: +a* +e* (org.apache.lucene.search.BooleanQuery)
Total memory after: 124882944
51267 total matching documents (2468ms)

----- test output ends -----

In my perfect world the memory allocation, when searching for  a* AND e*,
should
not increase at all after the both separate searches  a*  and  e*, cause it
would just allocate space for a*-hits, and ignoring e*-hits that has no
previous
hit.


My biggest index lies at 2,34 million documents during testing, but should
grow
with approximately 10000 docs/day in production.
With that figure I wish for the best possible memory handling.


At the moment we use a search engine that, given the right question (or
wrong),
consumes memory like a starving wolf and crashes the whole thing. The
search
engine should be able to play with about 1GB RAM on the machine.
I just don't want the same possibilities of a crash with Lucene too.


I want to know if the Lucene developers feel that there are things to
optimize
or if they have done everything like it should be from the start ?


thanks
/RW



Has there been any new input on this?  I have tested and can set my jvm
memory as high as 1.5G, but that doesn't seem to resolve the issue, only
delay the OutOfMemoryError.

Thanks for everything.

Lucinda R. Rockemore
VWR International . e-Business
e-mail: lucinda_rockemore@vwr.com
phone: 610 429 2731
fax: 610 429 5559






****************************************************************************************************************

The information contained in this e-mail message may be privileged,
confidential and protected from disclosure.
If you are not the intended recipient, any dissemination, distribution or
copying is strictly prohibited. If you think
that you have received this e-mail message in error please e-mail the
sender and delete the message. Thank you
*****************************************************************************************************************





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message