pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Fwd: Memory Errors with PDFBOX
Date Wed, 30 Jan 2019 16:13:37 GMT
forwarding to the correct pdfbox address... sorry for the noise...

---------- Forwarded message ---------
From: Tim Allison <tallison@apache.org>
Date: Wed, Jan 30, 2019 at 10:29 AM
Subject: Re: Memory Errors with PDFBOX
To: <user@tika.apache.org>, Jim <jimjim@protonmail.com>, <users@pdfbox.org>


@PDFBox colleagues,
  Any thoughts/recommendations?

On Wed, Jan 30, 2019 at 9:43 AM Jim <jimjim@protonmail.com> wrote:
>
> I have a simple Tika REST service that accepts a Base64Encoded String (which for testing
is a PDF File in this case).
>
> The REST service that receives the string Base64-decodes the string and passes it to
Tika for file text extraction (from the binary PDF content after Base64 Decode).
>
> Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 150 MB! 
No errors at all.
>
> Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get the error
stack below.
>
> I've tried upping the memory used by Tomcat (CATALINA_OPTS environment variable in Windows
on AWS), but locally on the iMac, I don't do anything special at all for all to work. Both
the working iMac and Windows have the same version of the service with Tika 1.20 libs.
>
> Would appreciate any advice or suggestions.
>
> Thanks very much.
>
> ERROR STACK:
>
> "java.lang.OutOfMemoryError: GC overhead limit exceeded
> at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
> at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
> at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879)
> at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
> at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40)
> at com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
> at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
> at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
> at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
> at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)
>
>
>
> Sent from ProtonMail, Swiss-based encrypted email.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message