pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: (pdffile) does not allow extracting content
Date Tue, 23 Feb 2016 17:03:19 GMT
Hi

Am 23.02.2016 um 17:53 schrieb Brzrk One:
> With pdfbox-1.8.11, using the bottom-up parser (loadNonSeq) on a document
> that has security ContentCopying: NotAllowed results in:
>
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser - PDF file
> 'some_temp_file.pdf' does not allow extracting content
>
> And the output pages are all blank.
>
> The top-down parser (load) has no such issue.
>
> Is there a workaround?
It's not a bug but a feature. PDFBox respects the security settings. In your 
case the sequential parser isn't capable of reading that information so that it 
doesn't block the text extractions as expected.

BR
Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message