pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: How do I analyze a problem PDF?
Date Wed, 01 Mar 2017 18:32:20 GMT
Am 01.03.2017 um 18:49 schrieb Thad Humphries:
> On Wed, Mar 1, 2017 at 12:32 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Am 01.03.2017 um 12:29 schrieb Thad Humphries:
>>
>>> On Wed, Mar 1, 2017 at 3:29 AM, Tilman Hausherr <THausherr@t-online.de>
>>> wrote:
>>>
>>> Am 28.02.2017 um 23:51 schrieb Thad Humphries:
>>>> No, the document has not been closed prematurely.
>>>>> and what's that?
>>>> inDoc.close();
>>>>
>>>> Well how about that?! When I comment out closing the second document, it
>>> works. Why? I've merged many PDFs, and all work when inDoc is closed.
>>>
>>
>> Hi,
>>
>> The probably reason is that the merged document uses some resources of the
>> original documents. Maybe it's a bug, maybe not (if we clone too much the
>> files may get too big); but the point is that if you close the original
>> document too early (by closing actively, or by letting the objects running
>> out of scope) you close parts that have to stay open. Solution: first save
>> and close your destination document, then close your source documents. The
>> downside is that it will use more memory.
>>
>> Tilman
>>
> Can I repeatedly close and reopen the destination PDDocument? Can I safely
> leave source open (inDoc), and let Java clean itself when my PrintToPdf
> class is out of scope? What might be the cost of either in time and
> resources (currently, we find PDFBox pleasingly fast).

No you should close inDoc too. Yes you can close and reopen your 
destination document. I've heard other people do this to lessen the 
memory footprint. Of  course it will be slightly slower.

Btw be careful about adding documents that you just created. This will 
not work if you use fonts that are subsetted, you need to save and 
reload such PDDocument.

Tilman

>
> Here's my situation: A web user requests a file collection by passing the
> server an identifier. Until the collection is open, the server does not
> know how many files are in the collection, and until it steps through each,
> it does not know what type of files they are--TIFF, PDF, JPEG, text, etc. A
> new PDDocument is created, and each image file is retrieved, processed into
> a BufferedImage, and added to the document.
>
> In other cases, a single image is put into a new PDDocument, then a
> PDDocument with notes about the image is created, and appended using my
> merge method:
>
>
>>> inDoc is the second file (source; my odd file, moroccan_chicken.pdf). It's
>>> the second parameter to PDFMergerUtility appendDocument():
>>>
>>> public void appendDocument(PDDocument
>>> <http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/
>>> pdfbox/pdmodel/PDDocument.html>
>>> destination,
>>>                     PDDocument
>>> <http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/
>>> pdfbox/pdmodel/PDDocument.html>
>>> source)
>>>                       throws IOException
>>> <http://download.oracle.com/javase/1.6.0/docs/api/java/io/IO
>>> Exception.html?is-external=true>
>>>
>>> PDFDocument document is destination.
>>> PDDocument inDoc is source.
>>>
>>> inDoc is out of scope by the time document.save() is called.
>>>
>>> Is there any harm in keeping inDoc open? There could be *many* instances
>>> of
>>> it before I'm done: I open a PDDocument, and add images and other PDFs to
>>> it before sending it to a browser with
>>>
>>>       ServletUtils.sendPDFHeader(filename, response); // my utility.
>>>       ServletOutputStream out = response.getOutputStream();
>>>       document.save(out);
>>>       document.close();
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message