Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A77D1107F6 for ; Tue, 8 Oct 2013 12:43:24 +0000 (UTC) Received: (qmail 10948 invoked by uid 500); 8 Oct 2013 12:43:24 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 10580 invoked by uid 500); 8 Oct 2013 12:43:23 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Delivered-To: moderator for users@pdfbox.apache.org Received: (qmail 28223 invoked by uid 99); 8 Oct 2013 11:51:42 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 977378.98514.bm@omp1067.mail.gq1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1381233060; bh=0nn1btD/OV2MTk0UTR61su303guybbYY+G3f1sU4idI=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=DqoVqj6YOMiOG1Pg+Ftd/bHkW96pVNZeufB3WrlB2rdRuw8Rd4b4ELER0gCkHevynWrFtexBNO2a6Ij5pgVPvG+ClD5IAIR3vt6uDNrvGEgYXRWQEYcK37Ow4UrefbL2Q1S5JX/qM3l7DHzf5r8GuoBhfXNWrfRigZ38bmv29lM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=1LiJFOpGWBve9zzW0nBf9WibvbEZ3xW+7PraYYSm16jzfpddPm07XD0GEF3yzQG+9Hb/FFlWuTif3xotHbdYSMpc6z7YINlgw+AAOY4RwHp+3I3SfRTEHGCEfMSbf0EVwsZ8COlm9dcDSSGvGF1QqnM/l71sIIQpCEEw5RHeNO0=; X-YMail-OSG: 7CRI5RYVM1nEuXzoYP5_R3FmGRdKSheZ_N2Zor43ku7P3ly 6rbRfljtp43oT3Eoo6hu4hAEmdau8tmOgB_rycPl1_zvuVjctJD_3JpdtMOC 5LvJfdtRD9Ji8FiD2N4ebngX7wTrGIrghD6UJfWrd2Xo0GxdAclR5h_HBerC S6HOe1duwIrdUhqRwdn8bS8o4cdwJO1PIVVV68h8tf_QeSqWsEP6dUVDdKG. ASZ7Mpd2ZpBWkNRPQvQA9v2fu8XVIhvz7tKTIpzVGycFMfAqMfmscltDFnxT t22.F2a2WNg5v0z7CvX3OYfFlsDeorzjP_hU_c6isof9HcfLWrJcltbvV_8j Rltqrr7yxKRYMJlCzWfqHmaE0kJTlc7y15fpizd9_zM.1dDJmpa89lfCT9zg YQWdg59PI0Ctdk7iJr_hHVdh6y3JdK074MMpwtJ2tZOiwmCsttLC8Szx8QPm 7p1DtBrAwzdCPeGyB0H16W_kuiVq4XAtISNkOJmxCsn5tGfaEzV5n5a4hL8p 0B1qU2SxfSHYrfoptGEH4f3JurRH5WLQeVYf0h.EbQuMJdT9v5dRXbG4Ja_t GfYYsSlywfp2muzhYMylTaIS3NcUiYWnw03EM7FNFVw-- X-Rocket-MIMEInfo: 002.001,SGVsbG8sCgoKCkkgYW0gdHJ5aW5nIHRvIGluZGV4IHBkZiBkb2N1bWVudHMgaW4gc29sci4gSSBhbSB1c2luZyBwZGZib3ggMS44LjIuClVudGlsIG5vdyBldmVyeXRoaW5nIHdvcmtlZCBmaW5lIGJ1dCBpIHJlY2VpdmVkIGEgZG9jdW1lbnQgdGhhdCBjb3VsZCBub3QgYmUgaW5kZXhlZCAoc2VlIGF0dGFjaG1lbnQpLgpUaGlzIGlzIHRoZSBzdGFjayB0cmFjZToKU0VWRVJFOiBGdWxsIEltcG9ydCBmYWlsZWQ6amF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb246IGphdmEubGFuZy5SdW50aW1lRXhjZXB0aW9uOiABMAEBAQE- X-Mailer: YahooMailWebService/0.8.160.587 References: <1381232116.28941.YahooMailNeo@web122102.mail.ne1.yahoo.com> <1381232230.53639.YahooMailNeo@web122103.mail.ne1.yahoo.com> <1381232314.73690.YahooMailNeo@web122101.mail.ne1.yahoo.com> <1381232561.73906.YahooMailNeo@web122106.mail.ne1.yahoo.com> Message-ID: <1381233060.24225.YahooMailNeo@web122104.mail.ne1.yahoo.com> Date: Tue, 8 Oct 2013 04:51:00 -0700 (PDT) From: Stanea Paul Reply-To: Stanea Paul Subject: java.lang.OutOfMemoryError: Java heap space To: "users@pdfbox.apache.org" In-Reply-To: <1381232561.73906.YahooMailNeo@web122106.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-851882195-394929273-1381233060=:24225" X-Virus-Checked: Checked by ClamAV on apache.org ---851882195-394929273-1381233060=:24225 Content-Type: multipart/alternative; boundary="-851882195-881159677-1381233060=:24225" ---851882195-881159677-1381233060=:24225 Content-Type: text/plain; charset=us-ascii Hello, I am trying to index pdf documents in solr. I am using pdfbox 1.8.2. Until now everything worked fine but i received a document that could not be indexed (see attachment). This is the stack trace: SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) ... 5 more Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.fontbox.cff.IndexData.initData(IndexData.java:95) at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:152) at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:103) at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:322) at org.apache.pdfbox.pdmodel.font.PDType1CFont.(PDType1CFont.java:104) at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:162) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:92) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:203) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:66) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:127) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) ... 6 more Is there some fix for this issue? Thanks, Paul ---851882195-881159677-1381233060=:24225 Content-Type: text/html; charset=us-ascii
Hello,

I am trying to index pdf documents in solr. I am using pdfbox 1.8.2.
Until now everything worked fine but i received a document that could not be indexed (see attachment).
This is the stack trace:
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
	at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
	at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
	... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:542)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
	... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.fontbox.cff.IndexData.initData(IndexData.java:95)
	at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:152)
	at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:103)
	at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:322)
	at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
	at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:92)
	at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:203)
	at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)
	at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
	at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
	at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
	at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
	at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
	at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455)
	at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379)
	at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335)
	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:66)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:127)
	at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
	at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
	... 6 more
Is there some fix for this issue?

Thanks,
Paul








---851882195-881159677-1381233060=:24225-- ---851882195-394929273-1381233060=:24225--