pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artur Jablonski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4247) Access permissions read by pdfbox are wrong.
Date Thu, 21 Jun 2018 10:56:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519225#comment-16519225

Artur Jablonski commented on PDFBOX-4247:

Hmmm... OK, I don't think I follow your comment, which is most likely because I have a very
vague idea about PDF format internals. 

So what you're saying is that the reason i don't get anything using `PDFTextStripper` class
has nothing to do with permissions, but with the way the text is represented in the file,
which is not a collection of glyphs, but some sort of vector graphic format.

If that's the case, is there any accurate, programatic way via PdfBox to detect this 'vector'
text and then deploy some OCR text recognition on it?

> Access permissions read by pdfbox are wrong.
> --------------------------------------------
>                 Key: PDFBOX-4247
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4247
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8, 2.0.9
>            Reporter: Artur Jablonski
>            Priority: Major
>         Attachments: PDFBOX-4247.pdf
> A pdf that in AcrobatReader shows that permissions to extract content and assembly document
are not granted, when parsed with PdfBox, for both {{PDDocument.getCurrentAccessPermission().canExtractContent()}}
and {{PDDocument.getCurrentAccessPermission().canAssembleDocument()}} returns {{true}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message