pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Antoch <SAnt...@Yuzu.com>
Subject COSParser: re-entering getLength() via parseObjectDynamically()
Date Thu, 19 Feb 2015 00:14:48 GMT

I have a question regarding the limitation on entering getLength() for a second time.

I understand that it is possible to create a malicious pdf which which essentially goes into
an infinite loop by having it parse nested streams that refer to each other.  I do not believe
this to be the case with these files (they are from well-known corporate book publishers).

Obviously, pdfbox prohibits this nesting behavior by passing Boolean flags around and setting
the inGetLength flag when it first enters then clearing it upon exit.

I have a several pdfs which open fine in Acrobat and Google Chrome (which is based on the
pdfium engine), yet when I try to open them using pdfbox they throw the "Object must be defined
and must not be compressed object"  error. 

By observation, it seems to me that pdfium seems to get around this issue by keeping a counter
of recursion depth (they use 64 max) and allowing essentially a short-depth nesting in this
way, but throwing an exception if the nesting gets too deep.

I have forked pdfbox up on Github and made those minor changes.

This seems to allow me to open the few  that I'd like for you to take a look at and comment
on it if you would.


Please let me know what you think-
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message