Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68AFB200C29 for ; Tue, 28 Feb 2017 23:17:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 67503160B7C; Tue, 28 Feb 2017 22:17:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8AD8A160B59 for ; Tue, 28 Feb 2017 23:16:59 +0100 (CET) Received: (qmail 78500 invoked by uid 500); 28 Feb 2017 22:16:58 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 78488 invoked by uid 99); 28 Feb 2017 22:16:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2017 22:16:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 02EB618DBE9 for ; Tue, 28 Feb 2017 22:16:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id V3NF8qmeslP2 for ; Tue, 28 Feb 2017 22:16:55 +0000 (UTC) Received: from mail-yw0-f170.google.com (mail-yw0-f170.google.com [209.85.161.170]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 06FD45F1EE for ; Tue, 28 Feb 2017 22:16:55 +0000 (UTC) Received: by mail-yw0-f170.google.com with SMTP id d1so19054008ywd.2 for ; Tue, 28 Feb 2017 14:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=23oBxnGsbl0ImNdkg+aMwHr3ZnJ87q5d+xFg5qeOYhw=; b=R9XKZ4S+L1bRMeRwsAgtyA3NuPRsn1AzAAfPB4mwGb6Sq+ioPNyzfadsMw0x3sljYr wgLYfdt5pI7DPRLCA8n78Zg54aMh5ldsmKPhEQjNA/whl/eJ8m+cIaunShh6yvk1NkO9 Ym0OhI+FAqYQ1BVPUGui6XYZRznhHXwURUKV0T0ncxfJ/n6mYr3u67ycbc41dH5tZt8M 9QqvVW9QPEU7IQ6Uovb5MLrsQhEm7+F9uBN9OOfE+ocSP5+t5govby1p4kbI+eWQptuq 65V87ewmfteXW+rAfiLQD4QqbScK7MIHgJtpcK0uBhIEel7pm8+rTcpJ4y7omu7cekul tXGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=23oBxnGsbl0ImNdkg+aMwHr3ZnJ87q5d+xFg5qeOYhw=; b=JFzI7pIJahZGmCW8IIKZMEJYSdW8qW5UcdMB7+V8iGFMxzA+E2CPp4UVPq1N8/hntb pqvQBPcQTrEZ5nzsMgGKgEbLEqZjVxhk/KiCaG5Uh4beaTIaEfeRwzB/aJjikxf6b/+o IlyK/AtEqy4/0xzFsXdYRRPiJyhysasCToW9O797vqBL6vYsc0nwgDE307TCzteDVvpl yd0E5aRnVl8GAATzoTLcbqSP23ikTMqsd10XOdI1cCSE/CKwE2maiSILpwoyWMXwJnYr Xvl5uL05aaW+JRRvv/1zXd3jasPDVLICC6V9lF8z/V4hB48RWFuDqWZwbMeNpuN6tfpz kOwA== X-Gm-Message-State: AMke39lRnYPjf8Z+o52P9KbuYHZuL5bNaWanYcImgVMwBZxcCE1NvpgWVu1M8QTo0NeE88sFmmAXHonDUI6vCQ== X-Received: by 10.13.207.69 with SMTP id r66mr1508031ywd.53.1488320213736; Tue, 28 Feb 2017 14:16:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.160.85 with HTTP; Tue, 28 Feb 2017 14:16:53 -0800 (PST) From: Thad Humphries Date: Tue, 28 Feb 2017 17:16:53 -0500 Message-ID: Subject: How do I analyze a problem PDF? To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary=94eb2c05428ca6f0fd05499e8ded archived-at: Tue, 28 Feb 2017 22:17:00 -0000 --94eb2c05428ca6f0fd05499e8ded Content-Type: text/plain; charset=UTF-8 I have a PDF 1.4 document that opens in different PDF viewers without warnings, yet there seems to be something odd about it. How might I analyze it? If I merge this PDF from the command line with pdfbox-app-2.0.4.jar's PDFMerger, the output is fine. However anytime I merge it in my own code, where it is first opened into a byte array, loaded to a document, then call PDFMergerUtility's appendDocument(destination, source), the destination PDDocument cannot be saved to disk. This is the only PDF of several dozen I've tested with this problem. I see nothing odd when I trace the program in a debugger. It fails only at PDDocument save() (stack trace below). If I open the original PDF (39K) in MacOSX Yosemite's Preview and save it, the saved PDF is now 40K, and it merges just fine in my code. (I believe this PDF was created about 5 years ago, most likely with the export-to-PDF action using OpenOffice on Linux. ) I expect that I'll see odd PDFs like this from time to time. (Lord knows that I've amassed quite a collection of buggy TIFF images over the years.) With TIFFs I can find out a lot using libtiff's tiffinfo and tiffdump utilities and a hex editor. Are there any routines in PDFBox that might help me with PDF files? Are there any other tools, open source or commercial? Stack trace from JUnit: java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed? at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77) at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125) at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200) at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383) at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158) at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522) at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460) at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444) at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096) at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419) at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367) at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254) at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232) at com.jthad.util.image.TestPrintToPdf.testMergeTwoPdfDocs(TestPrintToPdf.java:192) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) -- "Hell hath no limits, nor is circumscrib'd In one self-place; but where we are is hell, And where hell is, there must we ever be" --Christopher Marlowe, *Doctor Faustus* (v. 121-24) --94eb2c05428ca6f0fd05499e8ded--