pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Resolved] (PDFBOX-383) BaseParser incorrectly handling stream, exhibiting IOException
Date Fri, 18 May 2012 16:59:07 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Lehmkühler resolved PDFBOX-383.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

The attached pdfs works fine using the new non sequential parser, see PDFBOX for details.
                
> BaseParser incorrectly handling stream, exhibiting IOException
> --------------------------------------------------------------
>
>                 Key: PDFBOX-383
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-383
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73 with java 5 running on windows platform
>            Reporter: Son
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: BaseParser.java, fail.pdf
>
>
> when loading pdf file containing a file attachment annotation , errors might occurs when
2 conditions arise:
> - the Length value for the dictionary of F stream holds an indirect reference to a integer
value
> - the content of the filtered stream contains the word 'endstream'
> typically this occurs when, in the pdf file, there is a stream description as follows:
> 12 0 obj
> << /Length 16 0 R
> /Filter /FlateDecode
> >>
> stream
> {content}
> endstream
> endobj
> ...
> 16 0 obj
> {length}
> endobj
> ....
> and it the {content} (filtered) contains the (filtered) string "endstream".
> (see on line 3700 of the attachment)
> the problem is related to the way stream content is (always) read by method readUntilEndStream
() that stop on first 'endstream' sequence end.
> a (partial) fix was made, that reads the stream content 3 different ways:
> - if the Length is known (this is a direct object), the {length} bytes are read and written
to the stream FilteredStream
> - if the Length is unknown and if the filter is FlateFilter, the code unfilters the datas
(the FlateDecode algorythm allows for not knowing the length of encoded data ahead of time)
and associates to the stream's unfiltered stream
> - otherwise, let current behavior
> Running the modified code on files exhibiting errors has fixed problems that was encountered.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message