pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Stream parsing issue in multi-stream page
Date Mon, 05 Feb 2018 15:25:55 GMT
Hi,



> Am 05.02.2018 um 15:43 schrieb Esteban R <eruiz0@hotmail.com>:
> 
> Hello. I need to rewrite a PDPage with many streams, one by one (making some transformations,
and there is a special need to do it one stream at a time). Parsing (and pdfdebug) returns
"wrong" tokens if one command begins at the end of the first stream and ends at the begining
of the next one. I'm using pdfbox-2.0.8.
> 
> Rewriting the stream with those tokens produces a corrupted page.
> How could we re-write the page without getting a corrupted page?
> Or, at least, how can we detect this kind of failures (or this one)?
> 
> Please find a simplified example here:
> http://www.filedropper.com/out3unc
> 
> The first stream is:
> /F1 10 Tf
> BT
> 40 764.138 Td
> 0 -12.138 Td
> [
> 
> and the second one is:
> (CD) ] TJ
> ET
> 
> In this case, running the following code:
>        Iterator<PDStream> itStreams = pdPage.getContentStreams();
>        while (itStreams.hasNext()) {
>            PDStream pdstream = itStreams.next();
>            PDFStreamParser parser = new PDFStreamParser(pdstream.toByteArray());
>            parser.parse();
>            List<Object> tokens = parser.getTokens();
>            for (Object token: tokens){
>                System.out.println("Token: "+token);
>            }
>        }
> 

instead of using pdPage.getContentStreams() and parsing the stream individually use pdPage.getContents()
and read all content into a byte[]. You can then pass that to PDFStreamParser.

That will give you this output 

Token: COSName{F1}
Token: COSInt{10}
Token: PDFOperator{Tf}
Token: PDFOperator{BT}
Token: COSInt{40}
Token: COSFloat{764.138}
Token: PDFOperator{Td}
Token: COSInt{0}
Token: COSFloat{-12.138}
Token: PDFOperator{Td}
Token: COSArray{[COSString{CD}]}
Token: PDFOperator{TJ}
Token: PDFOperator{ET}

BR
Maruan


> shows:
> Token: COSName{F1}
> Token: COSInt{10}
> Token: PDFOperator{Tf}
> Token: PDFOperator{BT}
> Token: COSInt{40}
> Token: COSFloat{764.138}
> Token: PDFOperator{Td}
> Token: COSInt{0}
> Token: COSFloat{-12.138}
> Token: PDFOperator{Td}
> Token: COSArray{[]}                    !!!!! empty array detected, end of first stream
> Token: COSString{CD}                 !!!!! begining of second stream
> Token: COSNull{}                         !!!!! closing "]"
> Token: PDFOperator{TJ}
> Token: PDFOperator{ET}
> 
> 
> Esteban


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message