pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: PDFBox 1.8.8. release
Date Mon, 24 Nov 2014 13:57:03 GMT
Andreas, 

Sounds good.  If you could ping me on TIKA-1442, I'll be sure to hear the message in a timely
fashion. :)

I just tried to build Tika with 1.8.8-SNAPSHOT, and I found a problem with the non-sequential
parser on one of our test files (http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/testPDF_protected.pdf).

This is the stacktrace with pdfbox-app-1.8.8-20141124.081221-143.jar's ExtractText -nonSeq:


Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
ExtractText failed with the following exception:
java.io.IOException
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
        at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:22
5)
        at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.ja
va:117)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:251)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
ne.java:235)
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
java:215)
        at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja
va:480)
        at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j
ava:405)
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java
:364)
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
Caused by: java.util.zip.DataFormatException: incorrect header check
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at java.util.zip.Inflater.inflate(Inflater.java:280)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)

        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
        ... 13 more

-----Original Message-----
From: Andreas Lehmkühler [mailto:andreas@lehmi.de] 
Sent: Monday, November 24, 2014 8:39 AM
To: dev@pdfbox.apache.org
Subject: RE: PDFBox 1.8.8. release

Hi,

> "Allison, Timothy B." <tallison@mitre.org> hat am 24. November 2014 um 13:10
> geschrieben:
> 
> 
> Let me know when to hit "run"...
Thanks for the offer, there is just one thing related to PDFBOX-2430 I'd like to
fix this evening ...... 

BR
Andras Lehmkühler

> 
> -----Original Message-----
> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de] 
> Sent: Sunday, November 23, 2014 12:27 PM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.8. release
> 
> Hi,
> 
> Am 23.11.2014 um 17:55 schrieb Tilman Hausherr:
> > Hi.
> >
> > I'd prefer to wait for the tests of Tim Allison... unless you want to live
> > with
> > the risk that he does the tests, and that we find a "big problem" within
> > that 3
> > day voting period...
> Good point.
> 
> > I haven't asked him to do these tests yet, because so much work was done on
> > both
> > parsers.
> I guess I'm done with parser changes at least in the 1.8 branch
> 
> > Tilman
> 
> BR
> Andreas Lehmkühler
> 
> >
> > Am 23.11.2014 um 17:14 schrieb Andreas Lehmkuehler:
> >> Hi,
> >>
> >> Am 11.11.2014 um 12:15 schrieb Andreas Lehmkühler:
> >>> Hi,
> >>>
> >>>> Andreas Lehmkühler <andreas@lehmi.de> hat am 3. November 2014
um 11:52
> >>>> geschrieben:
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> there are again a number of solved issues and I'm thinking about a new
> >>>> bugfix release. How about a new one next week, maybe later if someone
> >>>> wants to get some addtional things done before?
> >>> Looks like I won't have the time this week to cut the release, sorry.
> >>> I'm not sure if I'll find some time when attending ApacheCon in Budapest
> >>> next
> >>> week,
> >>> but I should have some cycles in the last week of november.
> >>>
> >>> This will buy us some time to fix some of the encryption/decryption
> >>> issues.
> >> I'm going to cut the release tomorrow in the evening, round about 24 hours
> >> from now. Any objections?
> >>
> >>
> >> BR
> >> Andreas Lehmkühler
> >
>
Mime
View raw message