lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lebiram <>
Subject Re: Optimize and Out Of Memory Errors
Date Wed, 24 Dec 2008 13:19:30 GMT
Is there away to not factor in norms data in scoring somehow?

I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in my code
it just dies with OutOfMemory errors.
How does Luke not allocate these norms?

From: Mark Miller <>
Sent: Tuesday, December 23, 2008 5:25:30 PM
Subject: Re: Optimize and Out Of Memory Errors

Mark Miller wrote:
> Lebiram wrote:
>> Also, what are norms 
> Norms are a byte value per field stored in the index that is factored into the score.
Its used for length normalization (shorter documents = more important) and index time boosting.
If you want either of those, you need norms. When norms are loaded up into an IndexReader,
its loaded into a byte[maxdoc] array for each field - so even if one document out of 400 million
has a field, its still going to load byte[maxdoc] for that field (so a lot of wasted RAM).
 Did you say you had 400 million docs and 7 fields? Google says that would be:
>    **400 million x 7 byte = 2 670.28809 megabytes**
> On top of your other RAM usage.
Just to avoid confusion, that should really read a byte per document per field. If I remember
right, it gives 255 boost possibilities, limited to 25 with length normalization.

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message