pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Fwd: Memory Errors with PDFBOX
Date Wed, 30 Jan 2019 19:36:53 GMT
It would be interesting if the issue could be reproduced with PDFBox 
alone, i.e. just load the file (or rather, the input stream, so it 
seems) in the Tomcat servlet.

If it can be reproduced - would it be possible to set up a non AWS 
tomcat with the same problem? And if yes, what are the settings?

All this should be tested on the same java version. (Which one is being 
used?)

Tilman



Am 30.01.2019 um 17:13 schrieb Tim Allison:
> forwarding to the correct pdfbox address... sorry for the noise...
>
> ---------- Forwarded message ---------
> From: Tim Allison <tallison@apache.org>
> Date: Wed, Jan 30, 2019 at 10:29 AM
> Subject: Re: Memory Errors with PDFBOX
> To: <user@tika.apache.org>, Jim <jimjim@protonmail.com>, <users@pdfbox.org>
>
>
> @PDFBox colleagues,
>    Any thoughts/recommendations?
>
> On Wed, Jan 30, 2019 at 9:43 AM Jim <jimjim@protonmail.com> wrote:
>> I have a simple Tika REST service that accepts a Base64Encoded String (which for
testing is a PDF File in this case).
>>
>> The REST service that receives the string Base64-decodes the string and passes it
to Tika for file text extraction (from the binary PDF content after Base64 Decode).
>>
>> Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 150 MB!
 No errors at all.
>>
>> Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get the
error stack below.
>>
>> I've tried upping the memory used by Tomcat (CATALINA_OPTS environment variable in
Windows on AWS), but locally on the iMac, I don't do anything special at all for all to work.
Both the working iMac and Windows have the same version of the service with Tika 1.20 libs.
>>
>> Would appreciate any advice or suggestions.
>>
>> Thanks very much.
>>
>> ERROR STACK:
>>
>> "java.lang.OutOfMemoryError: GC overhead limit exceeded
>> at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
>> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
>> at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88)
>> at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993)
>> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879)
>> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
>> at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
>> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173)
>> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>> at com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40)
>> at com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
>> at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
>> at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
>> at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
>> at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)
>>
>>
>>
>> Sent from ProtonMail, Swiss-based encrypted email.
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message