lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: suppressing FreqProxPostingsArray
Date Mon, 19 Mar 2012 21:32:07 GMT
Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY.

We shouldn't have to allocate/use docFreqs, lastDocCodes,
lastPositions arrays (3 of the 7); the others are still needed, I
think.

But, that said, you shouldn't hit OOME, as long as your max heap sizes
is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is
small enough); Lucene should simply flush a new segment once the
buffered documents are using too much RAM.

Hmm, and you don't index massive documents.  How many UUIDs per document?

Mike McCandless

http://blog.mikemccandless.com



On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken <ken.mccracken@gmail.com> wrote:
> Hi,
>
> I am using lucene-3.5 and getting an OutOfMemoryError on a large indexing
> task of 100M documents.  I am creating an index with 3 UUIDs as separate
> field values.  I am using Store.YES on 1 of them and Store.NO on the
> others; I am using Index.NOT_ANALYZED_NO_NORMS on all three; explicitly
> setting
> field.setIndexOptions(IndexOptions.DOCS_ONLY);          and
> indexWriterConfig.setTermIndexInterval(termIndexInterval);   to 1024.  I am
> trying to index 100M records into my index.
>
> Is there any reason FreqProxTermsWriterPerField.FreqProxPostingsArray needs
> to be constructed even though I have the positions etc suppressed?  It
> seems that the reason I get an OutOfMemoryError is that 7 int[] of size
> proportional to number of unique fields are being constructed; however, at
> least some of them are probably wasteful given my indexing configurations.
>
> Any help is appreciated.
>
> Thanks,
> -Ken
>
>     [junit] Error:
>    [junit] Exception in thread "Thread-18" java.lang.OutOfMemoryError:
> Java heap space
>    [junit]     at
> org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:35)
>    [junit]     at
> org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190)
>    [junit]     at
> org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
>    [junit]     at
> org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
>    [junit]     at
> org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:137)
>    [junit]     at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:440)
>    [junit]     at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:94)
>    [junit]     at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message