pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Davis <marcdavis...@gmail.com>
Subject Re: Mollify PDF before merging
Date Fri, 03 Oct 2014 19:39:38 GMT
Tilman,

Please accept my sincere apologies for incorrectly calling you Tim!  This was a genuine oversight.

Here are my issues:

Problem 1:

We use PDFBox 1.8.7 to merge these two files:

https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0

This is the resultant merged file:
https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0

The problem: page TL-9 appears black as shown here:
https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
————————
Problem 2:

We used PDFBox 1.8.7 to merge these two files:
https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0

The merge does not proceed due to password encryption of badform.pdf.  Does PDFBox have a
way to handle password encrypt files?  Strangely, the file can be opened normally (without
the need to enter a password)!

We had another 8 files that did not merge properly with 1.8.6, but now merges fine with 1.8.7.
 Only the two issues above are outstanding.

Thanks,
Marc



On Oct 2, 2014, at 3:23 PM, Tilman Hausherr <THausherr@t-online.de> wrote:

> Am 02.10.2014 um 20:28 schrieb Marc Davis:
>> Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.
> 
> I'm "Tilman". "Tim" is a (very nice) committer from Apache TIKA, a project that does
use PDFBox.
> 
>> We do have two images that seem troublesome:
>> 
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging
TL-9 page is black)
> 
> Then please post the other file, and the result. In other words - just assume we're dumb
and lazy, so please provide every file / step that produces an error, rather describe more
than needed. Even then, solutions may take some time:
> https://issues.apache.org/jira/browse/PDFBOX-1511
> took oder a year and was a group effort of at least six people.
> 
> And there's a contradiction: you're writing "1.8.7 seems to have fixed all our issues",
but then you're mentioning two new problems...
> 
>> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected,
does PDFBox have a way around this?)
> 
> I was able to display it in the browser. I didn't test it wirh PDFBox; some files are
protected with the empty password. If you use the new nonSeq parser (loadNonSeq()), just use
"" as extra parameter. If you use load(), then it is more complex, then use openProtection()
(download the source code to see how)
> 
> Tilman
> 
>> 
>> I’d love to hear your thoughts on this...
>> 
>> Thanks,
>> Marc
>> 
>> 
>> 
>> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <THausherr@t-online.de> wrote:
>> 
>>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully.
(A problem related to identically named resources in both PDF files)
>>> 
>>> if it still happens, please open an issue in JIRA, and attach the two PDF files
and the result. If the files are confidential, please try producing non-confidential files.
>>> 
>>> Tilman
>>> 
>>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>>> I am using v1.8.6
>>>> 
>>>> Thanks,
>>>> Marc
>>>> 
>>>> 
>>>> 
>>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:
>>>> 
>>>>> What version are you using? We recently fixed a bug with merge.
>>>>> 
>>>>> Tilman
>>>>> 
>>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>>> I use pdfbox to merge PDF files but we find that many files from
scanners or files generated from AutoCAD do not merge properly (they are either blank or missing
fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows)
or Preview in Mac, and then merge again, the merge works fine!
>>>>>> 
>>>>>> Is there a workaround for this in PDFBox?
>>>>>> 
>>>>>> Thanks,
>>>>>> Marc
>>>>>> 
>>>>>> 
>>>>>> 
> 


Mime
View raw message