lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lebiram <lebi...@ymail.com>
Subject Re: Optimize and Out Of Memory Errors
Date Wed, 24 Dec 2008 13:19:30 GMT
Is there away to not factor in norms data in scoring somehow?

I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in my code
it just dies with OutOfMemory errors.
How does Luke not allocate these norms?




________________________________
From: Mark Miller <markrmiller@gmail.com>
To: java-user@lucene.apache.org
Sent: Tuesday, December 23, 2008 5:25:30 PM
Subject: Re: Optimize and Out Of Memory Errors

Mark Miller wrote:
> Lebiram wrote:
>> Also, what are norms 
> Norms are a byte value per field stored in the index that is factored into the score.
Its used for length normalization (shorter documents = more important) and index time boosting.
If you want either of those, you need norms. When norms are loaded up into an IndexReader,
its loaded into a byte[maxdoc] array for each field - so even if one document out of 400 million
has a field, its still going to load byte[maxdoc] for that field (so a lot of wasted RAM).
 Did you say you had 400 million docs and 7 fields? Google says that would be:
> 
> 
>    **400 million x 7 byte = 2 670.28809 megabytes**
> 
> On top of your other RAM usage.
Just to avoid confusion, that should really read a byte per document per field. If I remember
right, it gives 255 boost possibilities, limited to 25 with length normalization.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message