pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim deVos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-3142) PDFMergerUtility with scratch file generates result with blank pages for certain source files.
Date Wed, 02 Dec 2015 21:43:11 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036699#comment-15036699
] 

Jim deVos commented on PDFBOX-3142:
-----------------------------------

Quick update: I rewrote the above test using PDFBox 2.0.0RC2.   The test passes and the result
pdf doesn't have the blank pages I'm seeing when using 1.8.10.  The API looks pretty straightforward,
but please let me know if I'm not actually utilizing a scratch disk:

{code}
    @Test
    public void testMergeWithScratchFiles() throws IOException {
        MemoryUsageSetting settings = MemoryUsageSetting.setupTempFileOnly().setTempDir(ROOT_DIR);
        File result = new File(ROOT_DIR, "result.pdf");
        PDFMergerUtility ut = new PDFMergerUtility();
        ut.addSource(coverpage);
        ut.addSource(document);
        ut.setDestinationFileName(result.getCanonicalPath());
        ut.mergeDocuments(settings);
        assertThat(result.length(), is( greaterThan(document.length())));
    }
{code}



> PDFMergerUtility with scratch file generates result with blank pages for certain source
files.
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3142
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3142
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.10
>         Environment: Ubuntu 14.04.3, java 1.8.0_66
>            Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We recently we
tried utilizing a scratch file (e.g. PDFMergerUtility.mergeDocumentsNonSeq())  to cut down
on the amount of RAM we are using. This approach works for the majority of pdf's in our system,
but some files cause the merger utility to generate resultant pdf's with a blank page.  Specifically,
the result pdf contains a blank page after the coverpage instead of the first page of the
second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 52 0 (origin
offset 7187557)}}
> I'll try to attach/link an example pdf soon, but currently I don't have permission to
redistribute any files that exhibit the problem.  However,  here's a simple snippet that replicates
the problem - it's pretty straightforward.
> {code}
>     @Test
>     public void testMergeNonSeq() throws IOException, COSVisitorException {
>         destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
>         PDFMergerUtility ut = new PDFMergerUtility();
>         RandomAccess ram = new RandomAccessFile(File.createTempFile("mergeram", ".bin"),
"rw");
>         ut.addSource(coverpagePdf);
>         ut.addSource(documentPdf);
>         ut.setDestinationFileName(destinationPdf.getCanonicalPath());
>         ut.mergeDocumentsNonSeq(ram);  
>         
>         //the only automated way we have to tell that something went wrong is to check
the size of the result
>         assertThat("destination pdf should be larger than the original pdf", destinationPdf.length(),
is( greaterThan(documentPdf.length())));
>     }
> {code}
> Note we only see this problem with PDFMergerUtility.mergeDocumentsNonSeq().  Using PDFMergerUtility.mergeDocuments()
does not exhibit any problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message