lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Optimize and Out Of Memory Errors
Date Tue, 23 Dec 2008 14:08:26 GMT

How many indexed fields do you have, overall, in the index?

If you have a very large number of fields that are "sparse" (meaning  
any given document would only have a small subset of the fields), then  
norms could explain what you are seeing.

Norms are not stored sparsely, so when segments get merged the "holes"  
get filled (occupy bytes on disk and in RAM) and consume more  
resources.  Turning off norms on sparse fields would resolve it, but  
you must rebuild the entire index since if even a single doc in the  
index has norms enabled for a given field, it "spreads".


Utan Bisaya wrote:

> Recently, our lucene index version was upgraded to 2.3.1 and the  
> index had to be rebuilt for several weeks which made the entire  
> index a total of 20 GB or so.
> After the the rebuild, a weekly sunday task was executed for  
> optimization.
> During that time, the optimization failed several times complaining  
> about OOM errors but then after a couple of tries, it completes.
> So the entire index is now 1 segment that is 20 GB.
> The problem is that any subsequent searches on that index fails with  
> OOM errors at Lucene's reading of bytes.
> Our environment:
> jvm Xmx1600 (This is the max we could set the box since it's windows)
> 8G Memory available on box
> 4G CPU (8 core) but only 12.5% is used. (Not sure if this would  
> impact it)
> Harddisk available is 120GB
> mergeFactor, and other lucene config is set at default.
> We
> checked this 20GB using luke and it has 400,000,000 documents in it.  
> It was able to count the docs however when we do the search in Luke  
> it fails giving us OOM errors.
> We also did the check index and the tool fails at the 20GB segment  
> but succeeds on the others.
> We've managed to rollback to a previously unoptimized index copy  
> (about 20GB or so) and the searches were find now. This unoptimized  
> index is made up of several 8GB segments and a few smaller segments.
> However there is a big possibility that the optimization error could  
> happen again...
> Does anybody have insights on why this is happening?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message