lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken McCracken <ken.mccrac...@gmail.com>
Subject Re: suppressing FreqProxPostingsArray
Date Tue, 20 Mar 2012 22:20:47 GMT
Hi Mike,

Thanks for the response.  We will do some more investigation.  We will  
look to see if there is a clean way to suppress at least the extra 3  
array allocations.

Cheers,

-Ken

On Mar 19, 2012, at 5:32 PM, Michael McCandless <lucene@mikemccandless.com 
 > wrote:

> Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY.
>
> We shouldn't have to allocate/use docFreqs, lastDocCodes,
> lastPositions arrays (3 of the 7); the others are still needed, I
> think.
>
> But, that said, you shouldn't hit OOME, as long as your max heap sizes
> is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is
> small enough); Lucene should simply flush a new segment once the
> buffered documents are using too much RAM.
>
> Hmm, and you don't index massive documents.  How many UUIDs per  
> document?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>
> On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken <ken.mccracken@gmail.com 
> > wrote:
>> Hi,
>>
>> I am using lucene-3.5 and getting an OutOfMemoryError on a large  
>> indexing
>> task of 100M documents.  I am creating an index with 3 UUIDs as  
>> separate
>> field values.  I am using Store.YES on 1 of them and Store.NO on the
>> others; I am using Index.NOT_ANALYZED_NO_NORMS on all three;  
>> explicitly
>> setting
>> field.setIndexOptions(IndexOptions.DOCS_ONLY);          and
>> indexWriterConfig.setTermIndexInterval(termIndexInterval);   to  
>> 1024.  I am
>> trying to index 100M records into my index.
>>
>> Is there any reason  
>> FreqProxTermsWriterPerField.FreqProxPostingsArray needs
>> to be constructed even though I have the positions etc suppressed?   
>> It
>> seems that the reason I get an OutOfMemoryError is that 7 int[] of  
>> size
>> proportional to number of unique fields are being constructed;  
>> however, at
>> least some of them are probably wasteful given my indexing  
>> configurations.
>>
>> Any help is appreciated.
>>
>> Thanks,
>> -Ken
>>
>>     [junit] Error:
>>    [junit] Exception in thread "Thread-18"  
>> java.lang.OutOfMemoryError:
>> Java heap space
>>    [junit]     at
>> org.apache.lucene.index.ParallelPostingsArray.<init> 
>> (ParallelPostingsArray.java:35)
>>    [junit]     at
>> org.apache.lucene.index.FreqProxTermsWriterPerField 
>> $FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190)
>>    [junit]     at
>> org.apache.lucene.index.FreqProxTermsWriterPerField 
>> $FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java: 
>> 204)
>>    [junit]     at
>> org.apache.lucene.index.ParallelPostingsArray.grow 
>> (ParallelPostingsArray.java:48)
>>    [junit]     at
>> org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray 
>> (TermsHashPerField.java:137)
>>    [junit]     at
>> org.apache.lucene.index.TermsHashPerField.add 
>> (TermsHashPerField.java:440)
>>    [junit]     at
>> org.apache.lucene.index.DocInverterPerField.processFields 
>> (DocInverterPerField.java:94)
>>    [junit]     at
>> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument 
>> (DocFieldProcessorPerThread.java:278)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message