pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: PDFBox 1.8.8. release
Date Mon, 24 Nov 2014 20:07:24 GMT
Hi,


Am 24.11.2014 um 14:57 schrieb Allison, Timothy B.:
> Andreas,
>
> Sounds good.  If you could ping me on TIKA-1442, I'll be sure to hear the message in
a timely fashion. :)
Done!

> I just tried to build Tika with 1.8.8-SNAPSHOT, and I found a problem with the non-sequential
parser on one of our test files (http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/testPDF_protected.pdf).
>
> This is the stacktrace with pdfbox-app-1.8.8-20141124.081221-143.jar's ExtractText -nonSeq:
I've added my changes and fixed the described issue as well.

BR
Andreas Lehmkühler

> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> ExtractText failed with the following exception:
> java.io.IOException
>          at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
>          at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
>          at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:22
> 5)
>          at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.ja
> va:117)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:251)
>          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:235)
>          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:215)
>          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja
> va:480)
>          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j
> ava:405)
>          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java
> :364)
>          at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
>          at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>          at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Caused by: java.util.zip.DataFormatException: incorrect header check
>          at java.util.zip.Inflater.inflateBytes(Native Method)
>          at java.util.zip.Inflater.inflate(Inflater.java:259)
>          at java.util.zip.Inflater.inflate(Inflater.java:280)
>          at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128)
>
>          at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101)
>          ... 13 more
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Monday, November 24, 2014 8:39 AM
> To: dev@pdfbox.apache.org
> Subject: RE: PDFBox 1.8.8. release
>
> Hi,
>
>> "Allison, Timothy B." <tallison@mitre.org> hat am 24. November 2014 um 13:10
>> geschrieben:
>>
>>
>> Let me know when to hit "run"...
> Thanks for the offer, there is just one thing related to PDFBOX-2430 I'd like to
> fix this evening ......
>
> BR
> Andras Lehmkühler
>
>>
>> -----Original Message-----
>> From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>> Sent: Sunday, November 23, 2014 12:27 PM
>> To: dev@pdfbox.apache.org
>> Subject: Re: PDFBox 1.8.8. release
>>
>> Hi,
>>
>> Am 23.11.2014 um 17:55 schrieb Tilman Hausherr:
>>> Hi.
>>>
>>> I'd prefer to wait for the tests of Tim Allison... unless you want to live
>>> with
>>> the risk that he does the tests, and that we find a "big problem" within
>>> that 3
>>> day voting period...
>> Good point.
>>
>>> I haven't asked him to do these tests yet, because so much work was done on
>>> both
>>> parsers.
>> I guess I'm done with parser changes at least in the 1.8 branch
>>
>>> Tilman
>>
>> BR
>> Andreas Lehmkühler
>>
>>>
>>> Am 23.11.2014 um 17:14 schrieb Andreas Lehmkuehler:
>>>> Hi,
>>>>
>>>> Am 11.11.2014 um 12:15 schrieb Andreas Lehmkühler:
>>>>> Hi,
>>>>>
>>>>>> Andreas Lehmkühler <andreas@lehmi.de> hat am 3. November 2014
um 11:52
>>>>>> geschrieben:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> there are again a number of solved issues and I'm thinking about
a new
>>>>>> bugfix release. How about a new one next week, maybe later if someone
>>>>>> wants to get some addtional things done before?
>>>>> Looks like I won't have the time this week to cut the release, sorry.
>>>>> I'm not sure if I'll find some time when attending ApacheCon in Budapest
>>>>> next
>>>>> week,
>>>>> but I should have some cycles in the last week of november.
>>>>>
>>>>> This will buy us some time to fix some of the encryption/decryption
>>>>> issues.
>>>> I'm going to cut the release tomorrow in the evening, round about 24 hours
>>>> from now. Any objections?
>>>>
>>>>
>>>> BR
>>>> Andreas Lehmkühler
>>>
>>


Mime
View raw message