pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PDFBOX-3852) Overlay a pdf file which is 750 pages ends up in OutOfMemoryError
Date Wed, 05 Jul 2017 16:43:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073904#comment-16073904
] 

Tilman Hausherr edited comment on PDFBOX-3852 at 7/5/17 4:42 PM:
-----------------------------------------------------------------

{quote}
Great! I'm glad it is going to be applied!
{quote}
No, that's not what I wrote. Your patch looks incorrect, I asked to correct it. But ignore
that for now. I think I got your ideas.

What I tried to do is:
- understand your text
- reproduce the problem you mention
- understand your patch. Apparently it sets a scratch file for the watermark document. And
uses a map for the watermark documents.

Re "understand your text", what do you mean with "happens based on jetty running time memory
setting, and pdf file size", do you mean it depends on memory and file size?

Please attach the file watermarked.pdf. I tried with the file mcafee.pdf and it worked without
the patch, however the result file had a size of 91 MB ! Which makes me doubt whether the
Overlay class should be used at all this way, i.e. shouldn't identical files use the same
internal document?

So my proposal is different but uses only one of your ideas (the map), please try it in your
project, without the other changes:
{code}
    public PDDocument overlay(Map<Integer, String> specificPageOverlayFile)
            throws IOException
    {
        HashMap <String,PDDocument> loadedDocuments = new HashMap<String,PDDocument>();
        HashMap <PDDocument,LayoutPage> layouts = new HashMap<PDDocument,LayoutPage>();
        loadPDFs();
        for (Map.Entry<Integer, String> e : specificPageOverlayFile.entrySet())
        {
            PDDocument doc = loadedDocuments.get(e.getValue());
            if (doc == null)
            {
                doc = loadPDF(e.getValue());
                loadedDocuments.put(e.getValue(), doc);
                layouts.put(doc,getLayoutPage(doc));
            }
            specificPageOverlay.put(e.getKey(), doc);
            specificPageOverlayPage.put(e.getKey(), layouts.get(doc));
        }
        processPages(inputPDFDocument);
        return inputPDFDocument;
    }
{code}

re patch - it should be against the trunk. But test the code above first.

-I also doubt that your change in createStream helps much-, these are tiny streams. And it's
not needed to close the scratch files separately.

So IMHO we can also use your scratch file proposal as a setting, but only for the input document.


was (Author: tilman):
{quote}
Great! I'm glad it is going to be applied!
{quote}
No, that's not what I wrote. Your patch looks incorrect, I asked to correct it. But ignore
that for now. I think I got your ideas.

What I tried to do is:
- understand your text
- reproduce the problem you mention
- understand your patch. Apparently it sets a scratch file for the watermark document. And
uses a map for the watermark documents.

Re "understand your text", what do you mean with "happens based on jetty running time memory
setting, and pdf file size", do you mean it depends on memory and file size?

Please attach the file watermarked.pdf. I tried with the file mcafee.pdf and it worked without
the patch, however the result file had a size of 91 MB ! Which makes me doubt whether the
Overlay class should be used at all this way, i.e. shouldn't identical files use the same
internal document?

So my proposal is different but uses only one of your ideas (the map), please try it in your
project, without the other changes:
{code}
    public PDDocument overlay(Map<Integer, String> specificPageOverlayFile)
            throws IOException
    {
        HashMap <String,PDDocument> loadedDocuments = new HashMap<String,PDDocument>();
        HashMap <PDDocument,LayoutPage> layouts = new HashMap<PDDocument,LayoutPage>();
        loadPDFs();
        for (Map.Entry<Integer, String> e : specificPageOverlayFile.entrySet())
        {
            PDDocument doc = loadedDocuments.get(e.getValue());
            if (doc == null)
            {
                doc = loadPDF(e.getValue());
                loadedDocuments.put(e.getValue(), doc);
                layouts.put(doc,getLayoutPage(doc));
            }
            specificPageOverlay.put(e.getKey(), doc);
            specificPageOverlayPage.put(e.getKey(), layouts.get(doc));
        }
        processPages(inputPDFDocument);
        return inputPDFDocument;
    }
{code}

re patch - it should be against the trunk. But test the code above first.

I also doubt that your change in createStream helps much, these are tiny streams. And it's
not needed to close the scratch files separately.

So IMHO we can also use your scratch file proposal as a setting, but only for the input document.

> Overlay a pdf file which is 750 pages ends up in OutOfMemoryError
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-3852
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3852
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.6
>         Environment: Unbuntu, jetty
>            Reporter: ryuukei
>            Assignee: Tilman Hausherr
>              Labels: Overlay
>         Attachments: 750-pages.pdf, McAfee.pdf, Overlay.patch, watermarked.pdf
>
>
> We found an issue and solution to fix it, you guys might would be interested to have
a look and see whether it is worth applying the attached patch to benefit more pdfbox users.
:-) And a bit more detail this error happens based on jetty running time memory setting, and
pdf file size.
> * Application platform:
> Unbuntu, jetty
> * The test case to produce this issue:
> Add simple overlay to all pages (in this case it is 750 pages). The processPages function
eats up the JVM memories while applying the overlay to the file.
> * sample code for using pdfbox overlay:
> {code}
>  PDDocument document = PDDocument.load( pdf );
>  HashMap<Integer, String> overlayGuide = new HashMap();
>  for (int i = 0; i < pagenunber; i++)
>  {
>   // "watermarked.pdf" meat to be a file which contains watermarks on the page
>    overlayGuide.put(i+1, "watermarked.pdf");
>  }
>  Overlay overlay = new Overlay();
>  overlay.setInputPDF( document );
>  overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
>  PDDocument overlayResult = overlay.overlay( overlayGuide );
> {code}
> * Error log:
> {code}
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | java.lang.OutOfMemoryError: Java
heap space
> STATUS | wrapper  | main    | 2017/07/03 13:06:23 | Filter trigger matched.  Restarting
JVM.
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.io.ScratchFile.<init>(ScratchFile.java:128)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.cos.COSStream.<init>(COSStream.java:55)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.createStream(Overlay.java:***)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.processPages(Overlay.java:364)
> INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.overlay(Overlay.java:128)
> {code}
> * Solution
> Apply MemoryUsageSetting to Overlay, allows Overlay to use file as temp output.
> * Update for the Overlay usage:
> {code}
>  PDDocument document = PDDocument.load( pdf );
>  HashMap<Integer, String> overlayGuide = new HashMap();
>  for (int i = 0; i < pagenunber; i++)
>  {
>    overlayGuide.put(i+1, "watermarked.pdf");
>  }
>  Overlay overlay = new Overlay();
>  overlay.setInputPDF( document );
>  overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
>  // set overlay to use temp file as out rather than memory
>  MemoryUsageSetting memoryUsageSetting = MemoryUsageSetting.setupTempFileOnly(  );
>  memoryUsageSetting.setTempDir( new File ( "someTempWorkingDir" ) );
>  overlay.setMemoryUsageSetting( memoryUsageSetting );
>  PDDocument overlayResult = overlay.overlay( overlayGuide );
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message