lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lebiram <>
Subject Re: Optimize and Out Of Memory Errors
Date Wed, 24 Dec 2008 14:43:12 GMT

Hello Mark, 

As of the moment the index could not be rebuilt to remove norms.

Right now, I'm trying to figure out what luke is doing by going through source code.

Using whatever settings I find, create a very small app just to do a bit of search.
This small app has 1600 mb heapspace while luke just has 256 max for heap space.

On reading the same big 1 segment index with 166 million docs, 
luke fails during checkIndex when it checks the norms, but searching is okay as long as I
limit it to say a few thousand documents.
However it's not the same for my app, been trying to limit it It still reads way too much

I'm wondering if this has anything to do with Similarity and Scoring. 
I was wondering if you could lead me to some settings or any clever tweaks. 

This problem will haunt me this christmas. :O

From: Mark Miller <>
Sent: Wednesday, December 24, 2008 2:20:23 PM
Subject: Re: Optimize and Out Of Memory Errors

We don't know those norms are "the" problem. Luke is loading norms if its searching that index.
But what else is Luke doing? What else is your App doing? I suspect your app requires more
RAM than Luke? How much RAM do you have and much are you allocating to the JVM?

The norms are not necessarily the problem you have to solve - but it would appear they are
taking up over 2 gig of memory. Unless you have some to spare (and it sounds like you may
not), it could be a good idea to turn them off for particular fields.

- Mark

Lebiram wrote:
> Is there away to not factor in norms data in scoring somehow?
> I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in
my code it just dies with OutOfMemory errors.
> How does Luke not allocate these norms?
> ________________________________
> From: Mark Miller <>
> To:
> Sent: Tuesday, December 23, 2008 5:25:30 PM
> Subject: Re: Optimize and Out Of Memory Errors
> Mark Miller wrote:
>> Lebiram wrote:
>>> Also, what are norms      
>> Norms are a byte value per field stored in the index that is factored into the score.
Its used for length normalization (shorter documents = more important) and index time boosting.
If you want either of those, you need norms. When norms are loaded up into an IndexReader,
its loaded into a byte[maxdoc] array for each field - so even if one document out of 400 million
has a field, its still going to load byte[maxdoc] for that field (so a lot of wasted RAM).
 Did you say you had 400 million docs and 7 fields? Google says that would be:
>>    **400 million x 7 byte = 2 670.28809 megabytes**
>> On top of your other RAM usage.
> Just to avoid confusion, that should really read a byte per document per field. If I
remember right, it gives 255 boost possibilities, limited to 25 with length normalization.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message