pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Haferstroh <haferst...@gmx.de>
Subject Re: Merging multiple PDF into HTTP output stream
Date Wed, 11 Dec 2013 16:35:20 GMT
Hi Maruan,

the temporary files are not a problem, I just wanted to know if it is 
necessary to keep them open until the merge is finished. Your answer 
implies the need to keep them open, so let it be.

I distilled the code I am using:

private static void downloadMergedPDF(HttpServletResponse response,
     List<InputStream> documentList, String fileName)
         throws IOException, COSVisitorException {

     response.setContentType("application/pdf");
     response.setContentLength(-1);
     response.addHeader("Content-disposition", "attachment; filename=" + 
fileName);
     OutputStream output = response.getOutputStream();

     PDFMergerUtility merger = new PDFMergerUtility();
     for (InputStream document : documentList) {
         merger.addSource(document);
     }
     merger.setDestinationStream(output);
     merger.mergeDocuments();

     output.flush();
     output.close();
}

I still want to know when the merger starts writing bytes to the output 
stream, already during the merge or after the merge has finished? This 
is important for me to estimate the time the user has to wait for the 
download to begin.

Regards
Joern

Am 07.12.2013 09:05, schrieb Maruan Sahyoun:
> Hi Joern,
>
> you could do it completely in memory but at the cost of memory consumption as all files
have to be kept until the merge finishes. So from my perspective adjusting the open file limit
is a better option.
>
> Maybe you can post a code snippet how you load the files and do the merging. Maybe there
is some easy way to improve that.
>
> BR
> Maruan Sahyoun
>
> Am 07.12.2013 um 01:01 schrieb Jörn Haferstroh <haferstroh@gmx.de>:
>
>> Hi,
>>
>> first let me give some credits to the developers of pdfbox for this very usable tool.
Please continue your work, guys!
>>
>> I have a web application storing lots of PDF documents in a database. For easier
bulk download and printing, I am using pdfbox to merge multiple PDF documents into one large
PDF document for download. The destination stream of the merge is the HTTP output stream,
so the merged PDF data goes directly to the requesting web client.
>>
>> Today I learned by a "too many open files" error, that pdfbox creates a temporary
file for each source input stream and keeps it open until the end of the merge process (I
tried to merge 1025 PDF sources into one PDF on a Linux box). Is this behaviour necessary,
maybe caused by the PDF format? However, I was able to handle it by increasing the open file
limit of the user.
>>
>> When does pdfbox write the first bytes into the merge output stream? Does it happen
during the merge process or after the last source has been merged? So, does the requesting
web client has to wait for the download to start until all sources have been merged or not?
>>
>> Thanks for information
>> Joern
>>
>


Mime
View raw message