jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Boneschanscher <jackrab...@boneschanscher.net>
Subject OutOfMemoryError on reindexing jackrabbit
Date Mon, 22 Jun 2009 14:42:32 GMT
Hi fellow jackrabbit users,

On reindexing the entire Jackrabbit 1.4 repository I get the following 
problem. With the use of Sun JRE 6 I got the following stacktace (Java 5 
doesn't give any):

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2734)
    at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
    at java.util.ArrayList.add(ArrayList.java:351)
    at org.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:105)
    at org.pdfbox.cmapparser.CMapParser.parse(CMapParser.java:97)
    at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:326)
    at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:174)
    at org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461)
    at 
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:690)
    at 
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128)
    at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268)
    at 
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200)
    at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172)
    at 
org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:75)
    at 
org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
    at 
org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
    at 
org.apache.jackrabbit.core.query.lucene.NodeIndexer.addBinaryValue(NodeIndexer.java:393)
    at 
org.apache.jackrabbit.core.query.lucene.NodeIndexer.addValue(NodeIndexer.java:282)
    at 
org.apache.jackrabbit.core.query.lucene.NodeIndexer.createDoc(NodeIndexer.java:221)
    at 
org.apache.jackrabbit.core.query.lucene.SearchIndex.createDocument(SearchIndex.java:861)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createDocument(MultiIndex.java:803)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createDocument(MultiIndex.java:818)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex$AddNode.execute(MultiIndex.java:1519)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.executeAndLog(MultiIndex.java:936)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1017)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)
    at 
org.apache.jackrabbit.core.query.lucene.MultiIndex.createIndex(MultiIndex.java:1023)

This is very unexpected because memory usage stays between 128 and 256 
Mb of memory and the maximum heapsize is set to 1,3 Gigabyte. Also 
system memory is readily available.

It may be related to:

https://issues.apache.org/jira/browse/PDFBOX-313

Is this resolved in a newer 1.4 version of Jackrabbit? We have a 
text-extractor build with the following info in the META-INF 
pom.properties file:

#Generated by Maven
#Fri Jan 11 14:40:02 EET 2008
version=1.4
groupId=org.apache.jackrabbit
artifactId=jackrabbit-text-extractors

Regards,

Johannes

Mime
View raw message