pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Engberg (JIRA)" <j...@apache.org>
Subject [jira] Created: (PDFBOX-503) PDF loader causes infinite loop on non-PDF inputs
Date Wed, 12 Aug 2009 23:52:14 GMT
PDF loader causes infinite loop on non-PDF inputs

                 Key: PDFBOX-503
                 URL: https://issues.apache.org/jira/browse/PDFBOX-503
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 0.8.0-incubator
            Reporter: Dave Engberg

The current SVN head for the pdfbox incubator will experience an infinite loop in PDFParser.parseHeader()
if you feed any non-PDF document to the parser.  The problem is that it tries to find the
PDF header within the document by skipping over any non-matching lines which don't start with
a numeric digit.  It relies on a readLine() function from BaseParser.java which will return
an empty string when the stream is at the end-of-file.  The parseHeader() call will loop on
these empty lines.

I've patched this in our system by throwing an IOException from BaseParser.readLine() if the
stream is already at the end-of-file at the beginning of that call.

Index: src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java
--- src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java	(revision 802578)
+++ src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java	(working copy)
@@ -1088,6 +1088,11 @@
         StringBuffer buffer = new StringBuffer( 11 );
+        if (pdfSource.isEOF())
+        {
+            throw new IOException( "Error: End-of-File, expected line");
+        }
         int c;
         while ((c = pdfSource.read()) != -1) 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message