pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream
Date Tue, 18 Jun 2019 16:28:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866806#comment-16866806
] 

Andreas Lehmkühler commented on PDFBOX-4550:
--------------------------------------------

The last files weren't corrupt. The CMaps consists of an array with more than 255 values.
I've limited the check (range with more than 255 values) to the increment variant. Everything
should work again.

> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
>                 Key: PDFBOX-4550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4550
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering, Text extraction
>    Affects Versions: 2.0.15
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.16, 3.0.0 PDFBox
>
>         Attachments: 169997-p1.pdf, EF6E2XR2XAXWUHZV5STGMYPWNNDLXDDT-p2.pdf, PDFBOX-3442-DirectResources.pdf,
PDFBOX-3442-DirectResources_unc.pdf, PDFBOX-4550-LG5S35JUXSEH5XJC6QYISY3OBUXCKAKR-p1-reduced.pdf,
pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with corrupt contents
in the beginbfrange segment where start and end have different lengths. This leads to poor
performance. Such entries can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message