pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Corrupted PDF file causing severe OOM
Date Thu, 16 May 2019 03:41:58 GMT
Am 15.05.2019 um 21:57 schrieb Slava G:
> But I tried to extract text using 2.0.15 and got immidiatelly exception and
> didn't get OOM.


I got slow response on the second page. I didn't wait until OOM.

Tilman



>
> On Wed, May 15, 2019, 22:52 Tilman Hausherr <THausherr@t-online.de> wrote:
>
>> Am 15.05.2019 um 16:00 schrieb Slava G:
>>> But seems that in PDFBox 2.0.15 it's already fixed as, when I run
>> tika-app
>>
>>
>> No it's not fixed. The cause is a corrupt ToUnicode stream. Fixed in
>>
>> https://issues.apache.org/jira/browse/PDFBOX-4550
>>
>> Try a snapshot within a few hours
>>
>>
>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.16-SNAPSHOT/
>>
>> Tilman
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message