pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Parsing read-only PDFs
Date Sat, 02 Apr 2016 15:38:30 GMT
Am 02.04.2016 um 17:35 schrieb Ian Rogers:
> Hi,
>
> I am using PDFBox 1.8.2 because I installed an available NuGet package for
> .Net
>
> My question is this. I am reading in the PDF files with the following
> commands:
>              PDDocument pdDoc = PDDocument.load(path_to_file);

use openProtection()

http://stackoverflow.com/a/29676278/535646

Tilman

>              java.util.List allPages =
> pdDoc.getDocumentCatalog().getAllPages();
>              PDPage firstPage = (PDPage)allPages.get(0);
>              PDStream contents = firstPage.getContents();
>              COSStream content = contents.getStream();
>              Debug.WriteLine(content.getStreamTokens());
>
> This works great until there is password security on the PDF, that does not
> allow modifying contents but does allow freely reading and copying of the
> PDF content. In that case I get an IO exception with the following stack
> trace:
>
>     at org.apache.pdfbox.cos.COSStream.doDecode(COSName , Int32 )
>     at org.apache.pdfbox.cos.COSStream.doDecode()
>     at org.apache.pdfbox.cos.COSStream.getUnfilteredStream()
>     at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(COSStream stream)
>     at org.apache.pdfbox.cos.COSStream.getStreamTokens()
>
> I used the utility PDFTextStripper and that seems to parse the PDF fine for
> PDF documents with and without the abovementioned password security. I
> looked through 1.8.10 source to compare what I am doing, but can't see how
> I am going wrong.
>
> Any help or pointers would be much appreciated.
>
> Thanks,
> Ian
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message