lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken McCracken <ken.mccrac...@gmail.com>
Subject suppressing FreqProxPostingsArray
Date Mon, 19 Mar 2012 19:29:41 GMT
Hi,

I am using lucene-3.5 and getting an OutOfMemoryError on a large indexing
task of 100M documents.  I am creating an index with 3 UUIDs as separate
field values.  I am using Store.YES on 1 of them and Store.NO on the
others; I am using Index.NOT_ANALYZED_NO_NORMS on all three; explicitly
setting
field.setIndexOptions(IndexOptions.DOCS_ONLY);          and
indexWriterConfig.setTermIndexInterval(termIndexInterval);   to 1024.  I am
trying to index 100M records into my index.

Is there any reason FreqProxTermsWriterPerField.FreqProxPostingsArray needs
to be constructed even though I have the positions etc suppressed?  It
seems that the reason I get an OutOfMemoryError is that 7 int[] of size
proportional to number of unique fields are being constructed; however, at
least some of them are probably wasteful given my indexing configurations.

Any help is appreciated.

Thanks,
-Ken

     [junit] Error:
    [junit] Exception in thread "Thread-18" java.lang.OutOfMemoryError:
Java heap space
    [junit]     at
org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:35)
    [junit]     at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190)
    [junit]     at
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
    [junit]     at
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
    [junit]     at
org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:137)
    [junit]     at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:440)
    [junit]     at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:94)
    [junit]     at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message