pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Possible memory leak when extracting text?
Date Fri, 10 May 2019 17:06:48 GMT
I could test on Tim's server, but Andreas already tested on Linux. I 
don't know anything about docker.

Tilman

Am 10.05.2019 um 15:52 schrieb Søren Pedersen:
> I have done some more testing, and I found that when I run on Windows there are no problems,
but when I run on Linux I get the memory leak. Tilman, would you be able to run the same test
on a Linux box? - or maybe using a Linux Docker container, like I showed originally?
>
> We would prefer to run our app on Linux, but this looks like a blocker for that unfortunately
:(
>
> Best regards,
> Søren Pedersen
> On 10 May 2019, 09.32 +0200, Søren Pedersen <sh.pedersen@gmail.com>, wrote:
>> Ok, thanks a lot for looking into this Tilman. I will try your suggestion and keep
fiddling with it :)
>>
>> Have a great weekend!
>> On 10 May 2019, 08.12 +0200, Tilman Hausherr <THausherr@t-online.de>, wrote:
>>> Am 10.05.2019 um 07:22 schrieb Søren Pedersen:
>>>> We have an application that can index the contents of PDF files, so that
we
>>>> can use that for a search algorithm. We use the Apache PDFBox library for
>>>> extracting text from a PDF, like this (where inputStream is a
>>>> ByteArrayInputStream containing the contents of the PDF file):
>>>>
>>>> PDFTextStripper pdfStripper = new PDFTextStripper();
>>>> pdDoc = PDDocument.load(inputStream,
>>>> MemoryUsageSetting.setupTempFileOnly());
>>>> String parsedText = pdfStripper.getText(pdDoc);
>>>
>>> You can pass the byte[] directly to load(). Also make sure that the
>>> bytes are not altered in any way, e.g. through a incorrectly configured
>>> web downloading, or an incorrectly configured resource loading
>>> ("filtering" option must be false).
>>>
>>>
>>> Also retry with 2.0.16 snapshot.
>>>
>>> Tilman
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message