pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Error on PDDocument.load
Date Wed, 11 Feb 2015 22:16:06 GMT
I wasn't able to create a non confidential version of the file that 
works with Adobe Reader. But here's an issue and a proposed patch.

https://issues.apache.org/jira/browse/PDFBOX-2679

Tilman

Am 11.02.2015 um 18:54 schrieb Tilman Hausherr:
> No, his file is confidential.
>
> However we might create a non confidential file that has the same error.
>
> Tilman
>
> Am 11.02.2015 um 18:40 schrieb John Hewson:
>> Can we get a JIRA issue open for this, preferably with the file 
>> attached?
>>
>> -- John
>>
>>> On 11 Feb 2015, at 00:29, Tilman Hausherr <THausherr@t-online.de> 
>>> wrote:
>>>
>>> Yes, they made hacks. So did we, for many types of malformed files. 
>>> Please send the file also to Andreas, unless you already did, he did 
>>> many workarounds for malformed files.
>>>
>>> Tilman
>>>
>>>> Am 11.02.2015 um 09:05 schrieb Kevin Morin:
>>>> Ok. Why other softwares are able to open it (like xpf)? I guess 
>>>> they made a hack to fix this? Are you going to do something too?
>>>>
>>>> Thanks
>>>> BR
>>>>
>>>> Kevin
>>>>
>>>>> On 11/02/2015 08:53, Tilman Hausherr wrote:
>>>>> Hi,
>>>>>
>>>>> I can reproduce the error. Your file is malformed. Please open it 
>>>>> with
>>>>> NOTEPAD++ and go to the end:
>>>>>
>>>>> xref
>>>>> 1 7
>>>>> 0000000000 65535 f
>>>>> 0000000009 00000 n
>>>>> 0000358745 00000 n
>>>>> 0000358842 00000 n
>>>>> 0000359029 00000 n
>>>>> 0000359087 00000 n
>>>>> 0000359138 00000 n
>>>>> trailer
>>>>>
>>>>> The first number (1) means the number of the first object. So it 
>>>>> would
>>>>> be 1. The second number(7) is the size of the table. The number 1 is
>>>>> incorrect, it should be 0, because "0000000000 65535 f" is the dummy
>>>>> object 0. Press CTRL-G and enter the offsets (e.g. 9, 45, 358745, 
>>>>> ...)
>>>>> and you will see what I mean.
>>>>>
>>>>>  From the pdf spec:
>>>>>
>>>>> The free entries in the cross-reference table form a linked list, 
>>>>> with
>>>>> each free entry containing the object number of the next. The first
>>>>> entry in the table (object number 0) is always free and has a 
>>>>> generation
>>>>> number of 65,535; it is the head of the linked list of free objects
>>>>>
>>>>> Tilman
>>>>>
>>>>>
>>>>>> Am 11.02.2015 um 08:21 schrieb Kevin Morin:
>>>>>> Hi,
>>>>>>
>>>>>> I am sorry, it seems that I did not send you the right file...
>>>>>> Actually, I was testing the wrong file on linux from the begining
>>>>>> also. The file is displaying blank also on linux and on java 7 or

>>>>>> 8...
>>>>>> Here is the right file.
>>>>>>
>>>>>> I am sorry to make you work for nothing...
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>>
>>>>>>> On 10/02/2015 21:32, Tilman Hausherr wrote:
>>>>>>> So we e-mailed and the result is
>>>>>>> - you're really working on W2008 with the file that you sent
me
>>>>>>> - you get the same error on W2008 with the app (and I don't)
>>>>>>>
>>>>>>> I have analysed that file and did some debug traces. If loading

>>>>>>> that on
>>>>>>> W2008 is a no-no, you'd have to build from source and I'll tell

>>>>>>> you the
>>>>>>> changes.
>>>>>>>
>>>>>>> http://home.snafu.de/tilman/tmp/pdfbox-app-2.0.0-TILMAN.jar
>>>>>>>
>>>>>>> Don't use that version for production. It contains lots of stuff

>>>>>>> for my
>>>>>>> own tests. Only use it for this problem. Here's the output that
you
>>>>>>> should get:
>>>>>>>
>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>>>> parseXrefStream
>>>>>>> INFORMATION: parseXrefStream: objByteOffset = 116
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 7 0 obj at offset: 16
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 8 0 obj at offset: 573
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 9 0 obj at offset: 633
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 10 0 obj at offset: 817
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 11 0 obj at offset: 914
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 12 0 obj at offset: 116
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 13 0 obj at offset: 436
>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>>>> parseXrefStream
>>>>>>> INFORMATION: parseXrefStream: objByteOffset = 363505
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 1 0 obj at offset: 359638
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 2 0 obj at offset: 363167
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 3 0 obj at offset: 363307
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 4 0 obj at offset: 363505
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 5 stmnr: 2
>>>>>>> Feb 10, 2015 9:27:18 PM 
>>>>>>> org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>> parse
>>>>>>> INFORMATION: PDFXrefStreamParser: 6 stmnr: 3
>>>>>>>
>>>>>>> What I wonder is if the offsets will be the same.
>>>>>>>
>>>>>>> Tilman
>>>>>>>
>>>>>>> PS: Sorry I usually can't help during EU business hours. Day
job 
>>>>>>> :-)
>>>>>>>
>>>>>>>
>>>>>>>> Am 09.02.2015 um 11:26 schrieb Kevin Morin:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I will probably have to migrate to java 8 because of a bug
in 
>>>>>>>> java 7
>>>>>>>> which throws an error when rendering a certain type of PDF
(cf 
>>>>>>>> thread
>>>>>>>> Error on PDFRenderer.renderImage (PDFBox 2.0)). Could someone

>>>>>>>> please
>>>>>>>> check why it is not working on Windows Server 2008 R2 Standard?

>>>>>>>> If you
>>>>>>>> do not have this OS, tell me what I can do to help you.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> BR
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>>> On 21/01/2015 12:26, Andreas Lehmkühler wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> Kevin Morin <morin@codelutin.com> hat am 21.
Januar 2015 um 
>>>>>>>>>> 12:14
>>>>>>>>>> geschrieben:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I thought I was running java 7 but it's java 8...
I tried 
>>>>>>>>>> with java 7
>>>>>>>>>> and it works. I do not need it to work with java
8, java 7 is 
>>>>>>>>>> ok for
>>>>>>>>>> me.
>>>>>>>>> It works for me using java 8 on win7 and linux as well.
I 
>>>>>>>>> guess, the
>>>>>>>>> issue has
>>>>>>>>> to be something else....
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> BR
>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>
>>>>>>>>>> Thanks for your help and for all your work.
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>>> On 21/01/2015 11:54, Maruan Sahyoun wrote:
>>>>>>>>>>> Hi Kevin
>>>>>>>>>>>
>>>>>>>>>>> works for me - what's your Java Version?
>>>>>>>>>>>
>>>>>>>>>>> BR
>>>>>>>>>>> Maruan
>>>>>>>>>>>
>>>>>>>>>>>> Am 21.01.2015 um 11:24 schrieb Kevin Morin

>>>>>>>>>>>> <morin@codelutin.com>:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> it does not work with PDFToImage either,
I still get a blank
>>>>>>>>>>>> image. Plus, I
>>>>>>>>>>>> did not set the nonSeq option however it
seems to be using 
>>>>>>>>>>>> the non
>>>>>>>>>>>> sequential parser. And I have the following
traces:
>>>>>>>>>>>> janv. 21, 2015 11:20:02 AM
>>>>>>>>>>>> org.apache.pdfbox.pdfparser.NonSequentialPDFParser
ch
>>>>>>>>>>>> eckXrefOffsets
>>>>>>>>>>>> GRAVE: Can't find the object 7 0 (origin
offset 359138)
>>>>>>>>>>>> janv. 21, 2015 11:20:03 AM
>>>>>>>>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine
>>>>>>>>>>>> opera
>>>>>>>>>>>> torException
>>>>>>>>>>>> GRAVE: Missing XObject: Im1
>>>>>>>>>>>>
>>>>>>>>>>>> BR
>>>>>>>>>>>>
>>>>>>>>>>>> Kevin
>>>>>>>>>>>>
>>>>>>>>>>>>> On 21/01/2015 11:11, Maruan Sahyoun wrote:
>>>>>>>>>>>>> Hi Kevin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> you can test with the PDFToImage command
[1] available in 
>>>>>>>>>>>>> from the
>>>>>>>>>>>>> pdfbox-app [2] if the issue happens there.
The source for
>>>>>>>>>>>>> PDFToImage is
>>>>>>>>>>>>> available in the tools section of the
SVN repo or online 
>>>>>>>>>>>>> viewable
>>>>>>>>>>>>> [3].
>>>>>>>>>>>>>
>>>>>>>>>>>>> BR
>>>>>>>>>>>>> Maruan
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/

>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [3]
>>>>>>>>>>>>> http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup

>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 21.01.2015 um 11:00 schrieb Kevin
Morin 
>>>>>>>>>>>>>> <morin@codelutin.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Andreas,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using the latest snapshot available
on the maven
>>>>>>>>>>>>>> repository. And I
>>>>>>>>>>>>>> am running my app on Windows Server
2008 R2 Standard and 
>>>>>>>>>>>>>> it does
>>>>>>>>>>>>>> not work
>>>>>>>>>>>>>> (white page). Could send me the code
or a jar to test on 
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> server to
>>>>>>>>>>>>>> check if it does not come from my
code?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kevin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19/01/2015 19:13, Andreas
Lehmkuehler wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 19.01.2015 um 12:45 schrieb
Kevin Morin:
>>>>>>>>>>>>>>>> Actually, the issue is not
only these traces. The real 
>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>> is that I
>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>> blank image when I try to
render the document.
>>>>>>>>>>>>>>> I've checked your PDF and everything
renders fine. I've 
>>>>>>>>>>>>>>> tried
>>>>>>>>>>>>>>> SNAPSHOT-891 on linux (running
java 1.8, 1.7 and 1.6) 
>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>> latest
>>>>>>>>>>>>>>> SNAPSHOT-947 on win7 running
java 1.7
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe your SNAPSHOT is outdated?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 19/01/2015 12:39,
Kevin Morin wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am using the 2.0 snapshot
version to images of pdfs, 
>>>>>>>>>>>>>>>>> but on
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> documents, I have the
following error when I call
>>>>>>>>>>>>>>>>> PDDocument.load(file):
>>>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>>>> (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864)

>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> Can't find
>>>>>>>>>>>>>>>>> the object 7 0 (origin
offset 359138)
>>>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>>>> (org.apache.pdfbox.contentstream.PDFStreamEngine:840)
-
>>>>>>>>>>>>>>>>> Missing
>>>>>>>>>>>>>>>>> XObject:
>>>>>>>>>>>>>>>>> Im1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I first had it a few
days ago (I did not report it, 
>>>>>>>>>>>>>>>>> shame on
>>>>>>>>>>>>>>>>> me) but
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> error did not occur when
I called the loadLegacy 
>>>>>>>>>>>>>>>>> method on
>>>>>>>>>>>>>>>>> PDDocument.
>>>>>>>>>>>>>>>>> But the loadLegacy method
is not available anymore...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The issue happens on
Windows (works fine on Debian).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks fo your help
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Kevin
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------

>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message