pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream
Date Mon, 17 Jun 2019 06:00:10 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865346#comment-16865346
] 

Andreas Lehmkühler commented on PDFBOX-4550:
--------------------------------------------

Looks like we have to make the CMapParser lenient as well. I'm on it

> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
>                 Key: PDFBOX-4550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4550
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering, Text extraction
>    Affects Versions: 2.0.15
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.16, 3.0.0 PDFBox
>
>         Attachments: 169997-p1.pdf, EF6E2XR2XAXWUHZV5STGMYPWNNDLXDDT-p2.pdf, PDFBOX-3442-DirectResources.pdf,
PDFBOX-3442-DirectResources_unc.pdf, PDFBOX-4550-LG5S35JUXSEH5XJC6QYISY3OBUXCKAKR-p1-reduced.pdf,
pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with corrupt contents
in the beginbfrange segment where start and end have different lengths. This leads to poor
performance. Such entries can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message