pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Humphries <thad.humphr...@gmail.com>
Subject Re: How do I analyze a problem PDF?
Date Wed, 01 Mar 2017 11:29:23 GMT
On Wed, Mar 1, 2017 at 3:29 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 28.02.2017 um 23:51 schrieb Thad Humphries:
>
>> No, the document has not been closed prematurely.
>>
>
> and what's that?
>
> inDoc.close();
>

Well how about that?! When I comment out closing the second document, it
works. Why? I've merged many PDFs, and all work when inDoc is closed.

inDoc is the second file (source; my odd file, moroccan_chicken.pdf). It's
the second parameter to PDFMergerUtility appendDocument():

public void appendDocument(PDDocument
<http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/pdfbox/pdmodel/PDDocument.html>
destination,
                  PDDocument
<http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/pdfbox/pdmodel/PDDocument.html>
source)
                    throws IOException
<http://download.oracle.com/javase/1.6.0/docs/api/java/io/IOException.html?is-external=true>

PDFDocument document is destination.
PDDocument inDoc is source.

inDoc is out of scope by the time document.save() is called.

Is there any harm in keeping inDoc open? There could be *many* instances of
it before I'm done: I open a PDDocument, and add images and other PDFs to
it before sending it to a browser with

    ServletUtils.sendPDFHeader(filename, response); // my utility.
    ServletOutputStream out = response.getOutputStream();
    document.save(out);
    document.close();

....
>
> document.save();
>
>
>
> Tilman
>
>
>
>
>
>   It's being processed
>> through the same calls that I use for all my other documents that must
>> merge a PDF, either ones I create or ones from a repository. In this code
>>
>>      ...
>>      File outpath = new File(OUT_DIR, "mergedTwoPdfs.pdf");
>>      document.save(new FileOutputStream(outpath.toString()));
>>      ...
>>
>> when I trace the execution in Eclipse, document's close member is false
>> immediately before calling document.save().
>>
>> Something else interesting: When I use PDFMerge to merge the good, 40K PDF
>> with another document, the output PDF is only 2K larger than the two files
>> themselves. But when I merge the original 39K PDF, the output is almost
>> 40K
>> larger than the two files.
>>
>> The code below will cause the error. The two portions in curly brackets
>> are
>> (essentially) the merge method in my PrintToPdf class. The stack trace is
>> the same. The odd PDF is the second one, "moroccan_chicken.pdf".
>>
>> I can see about a place to post it tomorrow, through a short-term
>> anonymous
>> FTP at my office. In the meanwhile I'll see if anything from PDFDebugger
>> makes sense to me. :)
>>
>>
>>    @Test
>>    public void testMerge2PdfDocs() throws Exception {
>>      File file0 = new File(this.getClass().getResource("/Bacon and
>> Brussels
>> Sprout Hash.pdf").toURI());
>>      byte [] buf0 = IOUtils.toByteArray(new FileInputStream(file0));
>>      File file1 = new
>> File(this.getClass().getResource("/moroccan_chicken.pdf").toURI());
>>      byte [] buf1 = IOUtils.toByteArray(new FileInputStream(file1));
>>      PDDocument document = new PDDocument();
>>      {
>>        PDFMergerUtility merger = new PDFMergerUtility();
>>        PDDocument inDoc = PDDocument.load(buf0);
>>        merger.appendDocument(document, inDoc);
>>        inDoc.close();
>>      }
>>      {
>>        PDFMergerUtility merger = new PDFMergerUtility();
>>        PDDocument inDoc = PDDocument.load(buf1);
>>        merger.appendDocument(document, inDoc);
>>        inDoc.close();
>>      }
>>      File outpath = new File(OUT_DIR, "mergedTwoPdfs.pdf");
>>      document.save(new FileOutputStream(outpath.toString()));
>>      document.close();
>>      assert true;
>>    }
>>
>> On Tue, Feb 28, 2017 at 5:30 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>> The best would be to upload the PDF somewhere, and also post your code.
>>>
>>> I analyse PDFs sometimes with NOTEPAD++, sometimes with PDFDebugger, and
>>> often both. But these help only those who know what to expect.
>>>
>>> The text below looks like a COSStream was closed prematurely (did you
>>> close the source documents too early?). I'd rather suspect a bug in your
>>> code or in our code.
>>>
>>> Tilman
>>>
>>>
>>>
>>>
>>> Am 28.02.2017 um 23:16 schrieb Thad Humphries:
>>>
>>> I have a PDF 1.4 document that opens in different PDF viewers without
>>>> warnings, yet there seems to be something odd about it. How might I
>>>> analyze
>>>> it?
>>>>
>>>> If I merge this PDF from the command line with pdfbox-app-2.0.4.jar's
>>>> PDFMerger, the output is fine. However anytime I merge it in my own
>>>> code,
>>>> where it is first opened into a byte array, loaded to a document, then
>>>> call
>>>> PDFMergerUtility's appendDocument(destination, source), the
>>>> destination PDDocument cannot be saved to disk. This is the only PDF of
>>>> several dozen I've tested with this problem. I see nothing odd when I
>>>> trace
>>>> the program in a debugger. It fails only at PDDocument save() (stack
>>>> trace
>>>> below).
>>>>
>>>> If I open the original PDF (39K) in MacOSX Yosemite's Preview and save
>>>> it,
>>>> the saved PDF is now 40K, and it merges just fine in my code. (I believe
>>>> this PDF was created about 5 years ago, most likely with the
>>>> export-to-PDF
>>>> action using OpenOffice on Linux. )
>>>>
>>>> I expect that I'll see odd PDFs like this from time to time. (Lord knows
>>>> that I've amassed quite a collection of buggy TIFF images over the
>>>> years.)
>>>> With TIFFs I can find out a lot using libtiff's tiffinfo and tiffdump
>>>> utilities and a hex editor. Are there any routines in PDFBox that might
>>>> help me with PDF files? Are there any other tools, open source or
>>>> commercial?
>>>>
>>>>
>>>> Stack trace from JUnit:
>>>>
>>>> java.io.IOException: COSStream has been closed and cannot be read.
>>>> Perhaps
>>>> its enclosing PDDocument has been closed?
>>>> at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
>>>> at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStre
>>>> am.java:125)
>>>> at
>>>> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWri
>>>> ter.java:1200)
>>>> at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
>>>> at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
>>>> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWrite
>>>> r.java:522)
>>>> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWrit
>>>> er.java:460)
>>>> at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.
>>>> java:444)
>>>> at
>>>> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSW
>>>> riter.java:1096)
>>>> at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
>>>> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
>>>> at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
>>>> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
>>>> at
>>>> com.jthad.util.image.TestPrintToPdf.testMergeTwoPdfDocs(Test
>>>> PrintToPdf.java:192)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:39)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at
>>>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>>>> FrameworkMethod.java:47)
>>>> at
>>>> org.junit.internal.runners.model.ReflectiveCallable.run(Refl
>>>> ectiveCallable.java:12)
>>>> at
>>>> org.junit.runners.model.FrameworkMethod.invokeExplosively(Fr
>>>> ameworkMethod.java:44)
>>>> at
>>>> org.junit.internal.runners.statements.InvokeMethod.evaluate(
>>>> InvokeMethod.java:17)
>>>> at
>>>> org.junit.internal.runners.statements.RunBefores.evaluate(
>>>> RunBefores.java:26)
>>>> at
>>>> org.junit.internal.runners.statements.RunAfters.evaluate(Run
>>>> Afters.java:27)
>>>> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>>>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>>> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>>>> at
>>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit
>>>> 4ClassRunner.java:70)
>>>> at
>>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit
>>>> 4ClassRunner.java:50)
>>>> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>>>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>>>> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>>>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>>>> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>>>> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>>>> at
>>>> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.
>>>> run(JUnit4TestReference.java:86)
>>>> at
>>>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(
>>>> TestExecution.java:38)
>>>> at
>>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTe
>>>> sts(RemoteTestRunner.java:459)
>>>> at
>>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTe
>>>> sts(RemoteTestRunner.java:678)
>>>> at
>>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(
>>>> RemoteTestRunner.java:382)
>>>> at
>>>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(
>>>> RemoteTestRunner.java:192)
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
"Hell hath no limits, nor is circumscrib'd In one self-place; but where we
are is hell, And where hell is, there must we ever be" --Christopher
Marlowe, *Doctor Faustus* (v. 121-24)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message