pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4550) Poor performance with corrupt ToUnicode stream
Date Thu, 13 Jun 2019 17:32:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863321#comment-16863321
] 

Andreas Lehmkühler commented on PDFBOX-4550:
--------------------------------------------

I've changed the conditions for the identity detection. They are less strict now so that the
identity encoding is used if the encoding is "Identity-H/V" {color:red}*or*{color} the name/ordering
string of the cmap contains "identity"

> Poor performance with corrupt ToUnicode stream
> ----------------------------------------------
>
>                 Key: PDFBOX-4550
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4550
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering, Text extraction
>    Affects Versions: 2.0.15
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.16, 3.0.0 PDFBox
>
>         Attachments: PDFBOX-3442-DirectResources.pdf, PDFBOX-3442-DirectResources_unc.pdf,
PDFBOX-4550-LG5S35JUXSEH5XJC6QYISY3OBUXCKAKR-p1-reduced.pdf, pdnekz1gvl7.pdf
>
>
> A confidential file with lots of corrupt streams has ToUnicode stream with corrupt contents
in the beginbfrange segment where start and end have different lengths. This leads to poor
performance. Such entries can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message