pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Humphries <thad.humphr...@gmail.com>
Subject Re: How do I analyze a problem PDF?
Date Wed, 01 Mar 2017 18:56:23 GMT
On Wed, Mar 1, 2017 at 1:32 PM, Tilman Hausherr <THausherr@t-online.de>

> Am 01.03.2017 um 18:49 schrieb Thad Humphries:
>> On Wed, Mar 1, 2017 at 12:32 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>> Am 01.03.2017 um 12:29 schrieb Thad Humphries:
>>> On Wed, Mar 1, 2017 at 3:29 AM, Tilman Hausherr <THausherr@t-online.de>
>>>> wrote:
>>>> Am 28.02.2017 um 23:51 schrieb Thad Humphries:
>>>>> No, the document has not been closed prematurely.
>>>>>> and what's that?
>>>>> inDoc.close();
>>>>> Well how about that?! When I comment out closing the second document,
>>>>> it
>>>> works. Why? I've merged many PDFs, and all work when inDoc is closed.
>>> Hi,
>>> The probably reason is that the merged document uses some resources of
>>> the
>>> original documents. Maybe it's a bug, maybe not (if we clone too much the
>>> files may get too big); but the point is that if you close the original
>>> document too early (by closing actively, or by letting the objects
>>> running
>>> out of scope) you close parts that have to stay open. Solution: first
>>> save
>>> and close your destination document, then close your source documents.
>>> The
>>> downside is that it will use more memory.
>>> Tilman
>>> Can I repeatedly close and reopen the destination PDDocument? Can I
>> safely
>> leave source open (inDoc), and let Java clean itself when my PrintToPdf
>> class is out of scope? What might be the cost of either in time and
>> resources (currently, we find PDFBox pleasingly fast).
> No you should close inDoc too. Yes you can close and reopen your
> destination document. I've heard other people do this to lessen the memory
> footprint. Of  course it will be slightly slower.
> Btw be careful about adding documents that you just created. This will not
> work if you use fonts that are subsetted, you need to save and reload such
> PDDocument.
> Tilman

I know that PDDocument (and COSDocument) has a close() method, but how do I
reopen PDDocument? I see no open() method. In my merge method (below), if
the global document is closed here (before inDoc.close() as you suggest) or
elsewhere (in another method), won't the next call to
"merger.appendDocument(document, inDoc);" fail?

  PDDocument document;
  protected void mergePdfDoc(byte [] buffer) throws IOException {
    PDFMergerUtility merger = new PDFMergerUtility();
    PDDocument inDoc = PDDocument.load(buffer);
    merger.appendDocument(document, inDoc);

Is there an exception thrown closing an already closed
PDDocument/COSDocument? Or reopening an opened document?

As for fonts, all I'm ever using is the PDType1Font fonts, so I think I'm
okay there.

>> Here's my situation: A web user requests a file collection by passing the
>> server an identifier. Until the collection is open, the server does not
>> know how many files are in the collection, and until it steps through
>> each,
>> it does not know what type of files they are--TIFF, PDF, JPEG, text, etc.
>> A
>> new PDDocument is created, and each image file is retrieved, processed
>> into
>> a BufferedImage, and added to the document.
>> In other cases, a single image is put into a new PDDocument, then a
>> PDDocument with notes about the image is created, and appended using my
>> merge method:
>> inDoc is the second file (source; my odd file, moroccan_chicken.pdf). It's
>>>> the second parameter to PDFMergerUtility appendDocument():
>>>> public void appendDocument(PDDocument
>>>> <http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/
>>>> pdfbox/pdmodel/PDDocument.html>
>>>> destination,
>>>>                     PDDocument
>>>> <http://localhost/~thad/pdfbox/pdfbox-2.0.4/org/apache/
>>>> pdfbox/pdmodel/PDDocument.html>
>>>> source)
>>>>                       throws IOException
>>>> <http://download.oracle.com/javase/1.6.0/docs/api/java/io/IO
>>>> Exception.html?is-external=true>
>>>> PDFDocument document is destination.
>>>> PDDocument inDoc is source.
>>>> inDoc is out of scope by the time document.save() is called.
>>>> Is there any harm in keeping inDoc open? There could be *many* instances
>>>> of
>>>> it before I'm done: I open a PDDocument, and add images and other PDFs
>>>> to
>>>> it before sending it to a browser with
>>>>       ServletUtils.sendPDFHeader(filename, response); // my utility.
>>>>       ServletOutputStream out = response.getOutputStream();
>>>>       document.save(out);
>>>>       document.close();
"Hell hath no limits, nor is circumscrib'd In one self-place; but where we
are is hell, And where hell is, there must we ever be" --Christopher
Marlowe, *Doctor Faustus* (v. 121-24)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message