pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Humphries <thad.humphr...@gmail.com>
Subject How do I analyze a problem PDF?
Date Tue, 28 Feb 2017 22:16:53 GMT
I have a PDF 1.4 document that opens in different PDF viewers without
warnings, yet there seems to be something odd about it. How might I analyze
it?

If I merge this PDF from the command line with pdfbox-app-2.0.4.jar's
PDFMerger, the output is fine. However anytime I merge it in my own code,
where it is first opened into a byte array, loaded to a document, then call
PDFMergerUtility's appendDocument(destination, source), the
destination PDDocument cannot be saved to disk. This is the only PDF of
several dozen I've tested with this problem. I see nothing odd when I trace
the program in a debugger. It fails only at PDDocument save() (stack trace
below).

If I open the original PDF (39K) in MacOSX Yosemite's Preview and save it,
the saved PDF is now 40K, and it merges just fine in my code. (I believe
this PDF was created about 5 years ago, most likely with the export-to-PDF
action using OpenOffice on Linux. )

I expect that I'll see odd PDFs like this from time to time. (Lord knows
that I've amassed quite a collection of buggy TIFF images over the years.)
With TIFFs I can find out a lot using libtiff's tiffinfo and tiffdump
utilities and a hex editor. Are there any routines in PDFBox that might
help me with PDF files? Are there any other tools, open source or
commercial?


Stack trace from JUnit:

java.io.IOException: COSStream has been closed and cannot be read. Perhaps
its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444)
at
org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
at
com.jthad.util.image.TestPrintToPdf.testMergeTwoPdfDocs(TestPrintToPdf.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)


-- 
"Hell hath no limits, nor is circumscrib'd In one self-place; but where we
are is hell, And where hell is, there must we ever be" --Christopher
Marlowe, *Doctor Faustus* (v. 121-24)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message