pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Error on PDDocument.load
Date Wed, 11 Feb 2015 08:29:20 GMT
Yes, they made hacks. So did we, for many types of malformed files. 
Please send the file also to Andreas, unless you already did, he did 
many workarounds for malformed files.

Tilman

Am 11.02.2015 um 09:05 schrieb Kevin Morin:
> Ok. Why other softwares are able to open it (like xpf)? I guess they 
> made a hack to fix this? Are you going to do something too?
>
> Thanks
> BR
>
> Kevin
>
> On 11/02/2015 08:53, Tilman Hausherr wrote:
>> Hi,
>>
>> I can reproduce the error. Your file is malformed. Please open it with
>> NOTEPAD++ and go to the end:
>>
>> xref
>> 1 7
>> 0000000000 65535 f
>> 0000000009 00000 n
>> 0000358745 00000 n
>> 0000358842 00000 n
>> 0000359029 00000 n
>> 0000359087 00000 n
>> 0000359138 00000 n
>> trailer
>>
>> The first number (1) means the number of the first object. So it would
>> be 1. The second number(7) is the size of the table. The number 1 is
>> incorrect, it should be 0, because "0000000000 65535 f" is the dummy
>> object 0. Press CTRL-G and enter the offsets (e.g. 9, 45, 358745, ...)
>> and you will see what I mean.
>>
>>  From the pdf spec:
>>
>> The free entries in the cross-reference table form a linked list, with
>> each free entry containing the object number of the next. The first
>> entry in the table (object number 0) is always free and has a generation
>> number of 65,535; it is the head of the linked list of free objects
>>
>> Tilman
>>
>>
>> Am 11.02.2015 um 08:21 schrieb Kevin Morin:
>>> Hi,
>>>
>>> I am sorry, it seems that I did not send you the right file...
>>> Actually, I was testing the wrong file on linux from the begining
>>> also. The file is displaying blank also on linux and on java 7 or 8...
>>> Here is the right file.
>>>
>>> I am sorry to make you work for nothing...
>>>
>>> BR
>>>
>>> Kevin
>>>
>>>
>>> On 10/02/2015 21:32, Tilman Hausherr wrote:
>>>> So we e-mailed and the result is
>>>> - you're really working on W2008 with the file that you sent me
>>>> - you get the same error on W2008 with the app (and I don't)
>>>>
>>>> I have analysed that file and did some debug traces. If loading 
>>>> that on
>>>> W2008 is a no-no, you'd have to build from source and I'll tell you 
>>>> the
>>>> changes.
>>>>
>>>> http://home.snafu.de/tilman/tmp/pdfbox-app-2.0.0-TILMAN.jar
>>>>
>>>> Don't use that version for production. It contains lots of stuff 
>>>> for my
>>>> own tests. Only use it for this problem. Here's the output that you
>>>> should get:
>>>>
>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>> parseXrefStream
>>>> INFORMATION: parseXrefStream: objByteOffset = 116
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 7 0 obj at offset: 16
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 8 0 obj at offset: 573
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 9 0 obj at offset: 633
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 10 0 obj at offset: 817
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 11 0 obj at offset: 914
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 12 0 obj at offset: 116
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 13 0 obj at offset: 436
>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>> parseXrefStream
>>>> INFORMATION: parseXrefStream: objByteOffset = 363505
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 1 0 obj at offset: 359638
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 2 0 obj at offset: 363167
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 3 0 obj at offset: 363307
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 4 0 obj at offset: 363505
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 5 stmnr: 2
>>>> Feb 10, 2015 9:27:18 PM 
>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>> parse
>>>> INFORMATION: PDFXrefStreamParser: 6 stmnr: 3
>>>>
>>>> What I wonder is if the offsets will be the same.
>>>>
>>>> Tilman
>>>>
>>>> PS: Sorry I usually can't help during EU business hours. Day job :-)
>>>>
>>>>
>>>> Am 09.02.2015 um 11:26 schrieb Kevin Morin:
>>>>> Hi,
>>>>>
>>>>> I will probably have to migrate to java 8 because of a bug in java 7
>>>>> which throws an error when rendering a certain type of PDF (cf thread
>>>>> Error on PDFRenderer.renderImage (PDFBox 2.0)). Could someone please
>>>>> check why it is not working on Windows Server 2008 R2 Standard? If 
>>>>> you
>>>>> do not have this OS, tell me what I can do to help you.
>>>>>
>>>>> Thanks
>>>>> BR
>>>>>
>>>>> Kevin
>>>>>
>>>>> On 21/01/2015 12:26, Andreas Lehmkühler wrote:
>>>>>> Hi,
>>>>>>
>>>>>>> Kevin Morin <morin@codelutin.com> hat am 21. Januar 2015
um 12:14
>>>>>>> geschrieben:
>>>>>>>
>>>>>>>
>>>>>>> I thought I was running java 7 but it's java 8... I tried with

>>>>>>> java 7
>>>>>>> and it works. I do not need it to work with java 8, java 7 is
ok 
>>>>>>> for
>>>>>>> me.
>>>>>> It works for me using java 8 on win7 and linux as well. I guess,
the
>>>>>> issue has
>>>>>> to be something else....
>>>>>>
>>>>>>
>>>>>> BR
>>>>>> Andreas Lehmkühler
>>>>>>
>>>>>>> Thanks for your help and for all your work.
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>> On 21/01/2015 11:54, Maruan Sahyoun wrote:
>>>>>>>> Hi Kevin
>>>>>>>>
>>>>>>>> works for me - what's your Java Version?
>>>>>>>>
>>>>>>>> BR
>>>>>>>> Maruan
>>>>>>>>
>>>>>>>> Am 21.01.2015 um 11:24 schrieb Kevin Morin <morin@codelutin.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> it does not work with PDFToImage either, I still get
a blank
>>>>>>>>> image. Plus, I
>>>>>>>>> did not set the nonSeq option however it seems to be
using the 
>>>>>>>>> non
>>>>>>>>> sequential parser. And I have the following traces:
>>>>>>>>> janv. 21, 2015 11:20:02 AM
>>>>>>>>> org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch
>>>>>>>>> eckXrefOffsets
>>>>>>>>> GRAVE: Can't find the object 7 0 (origin offset 359138)
>>>>>>>>> janv. 21, 2015 11:20:03 AM
>>>>>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine
>>>>>>>>> opera
>>>>>>>>> torException
>>>>>>>>> GRAVE: Missing XObject: Im1
>>>>>>>>>
>>>>>>>>> BR
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>>>>>>>> On 21/01/2015 11:11, Maruan Sahyoun wrote:
>>>>>>>>>> Hi Kevin,
>>>>>>>>>>
>>>>>>>>>> you can test with the PDFToImage command [1] available
in 
>>>>>>>>>> from the
>>>>>>>>>> pdfbox-app [2] if the issue happens there. The source
for
>>>>>>>>>> PDFToImage is
>>>>>>>>>> available in the tools section of the SVN repo or
online 
>>>>>>>>>> viewable
>>>>>>>>>> [3].
>>>>>>>>>>
>>>>>>>>>> BR
>>>>>>>>>> Maruan
>>>>>>>>>>
>>>>>>>>>> [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
>>>>>>>>>> [2]
>>>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/

>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [3]
>>>>>>>>>> http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup

>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am 21.01.2015 um 11:00 schrieb Kevin Morin 
>>>>>>>>>> <morin@codelutin.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Andreas,
>>>>>>>>>>>
>>>>>>>>>>> I am using the latest snapshot available on the
maven
>>>>>>>>>>> repository. And I
>>>>>>>>>>> am running my app on Windows Server 2008 R2 Standard
and it 
>>>>>>>>>>> does
>>>>>>>>>>> not work
>>>>>>>>>>> (white page). Could send me the code or a jar
to test on this
>>>>>>>>>>> server to
>>>>>>>>>>> check if it does not come from my code?
>>>>>>>>>>>
>>>>>>>>>>> BR
>>>>>>>>>>>
>>>>>>>>>>> Kevin
>>>>>>>>>>>
>>>>>>>>>>> On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 19.01.2015 um 12:45 schrieb Kevin Morin:
>>>>>>>>>>>>> Actually, the issue is not only these
traces. The real issue
>>>>>>>>>>>>> is that I
>>>>>>>>>>>>> have a
>>>>>>>>>>>>> blank image when I try to render the
document.
>>>>>>>>>>>> I've checked your PDF and everything renders
fine. I've tried
>>>>>>>>>>>> SNAPSHOT-891 on linux (running java 1.8,
1.7 and 1.6) and the
>>>>>>>>>>>> latest
>>>>>>>>>>>> SNAPSHOT-947 on win7 running java 1.7
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe your SNAPSHOT is outdated?
>>>>>>>>>>>>
>>>>>>>>>>>> BR
>>>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>>>>
>>>>>>>>>>>>> On 19/01/2015 12:39, Kevin Morin wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using the 2.0 snapshot version
to images of pdfs, 
>>>>>>>>>>>>>> but on
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>> documents, I have the following error
when I call
>>>>>>>>>>>>>> PDDocument.load(file):
>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>> (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864)
-
>>>>>>>>>>>>>> Can't find
>>>>>>>>>>>>>> the object 7 0 (origin offset 359138)
>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>> (org.apache.pdfbox.contentstream.PDFStreamEngine:840)
-
>>>>>>>>>>>>>> Missing
>>>>>>>>>>>>>> XObject:
>>>>>>>>>>>>>> Im1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I first had it a few days ago (I
did not report it, shame on
>>>>>>>>>>>>>> me) but
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> error did not occur when I called
the loadLegacy method on
>>>>>>>>>>>>>> PDDocument.
>>>>>>>>>>>>>> But the loadLegacy method is not
available anymore...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The issue happens on Windows (works
fine on Debian).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks fo your help
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kevin
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message