Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 33FE0200CBB for ; Tue, 4 Jul 2017 18:50:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 31621160C2C; Tue, 4 Jul 2017 16:50:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53E60160C1C for ; Tue, 4 Jul 2017 18:50:03 +0200 (CEST) Received: (qmail 66265 invoked by uid 500); 4 Jul 2017 16:50:02 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 66254 invoked by uid 99); 4 Jul 2017 16:50:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jul 2017 16:50:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EDE5BD1238 for ; Tue, 4 Jul 2017 16:50:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6Jc2qHy1RWlg for ; Tue, 4 Jul 2017 16:50:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D12F55FB96 for ; Tue, 4 Jul 2017 16:50:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 51379E0069 for ; Tue, 4 Jul 2017 16:50:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 10250245F9 for ; Tue, 4 Jul 2017 16:50:00 +0000 (UTC) Date: Tue, 4 Jul 2017 16:50:00 +0000 (UTC) From: "Tilman Hausherr (JIRA)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PDFBOX-3852) Overlay a pdf file which is 750 pages ends up in OutOfMemoryError MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 04 Jul 2017 16:50:04 -0000 [ https://issues.apache.org/jira/browse/PDFBOX-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073904#comment-16073904 ] Tilman Hausherr commented on PDFBOX-3852: ----------------------------------------- {quote} Great! I'm glad it is going to be applied! {quote} No, that's not what I wrote. Your patch looks incorrect, I asked to correct it. But ignore that for now. I think I got your ideas. What I tried to do is: - understand your text - reproduce the problem you mention - understand your patch. Apparently it sets a scratch file for the watermark document. And uses a map for the watermark documents. Re "understand your text", what do you mean with "happens based on jetty running time memory setting, and pdf file size", do you mean it depends on memory and file size? Please attach the file watermarked.pdf. I tried with the file mcafee.pdf and it worked without the patch, however the result file had a size of 91 MB ! Which makes me doubt whether the Overlay class should be used at all this way, i.e. shouldn't identical files use the same internal document? So my proposal is different but uses only one of your ideas (the map), please try it in your project, without the other changes: {code} public PDDocument overlay(Map specificPageOverlayFile) throws IOException { HashMap loadedDocuments = new HashMap(); HashMap layouts = new HashMap(); loadPDFs(); for (Map.Entry e : specificPageOverlayFile.entrySet()) { PDDocument doc = loadedDocuments.get(e.getValue()); if (doc == null) { doc = loadPDF(e.getValue()); loadedDocuments.put(e.getValue(), doc); layouts.put(doc,getLayoutPage(doc)); } specificPageOverlay.put(e.getKey(), doc); specificPageOverlayPage.put(e.getKey(), layouts.get(doc)); } processPages(inputPDFDocument); return inputPDFDocument; } {code} re patch - it should be against the trunk. But test the code above first. I also doubt that your change in createStream helps much, these are tiny streams. And it's not needed to close the scratch files separately. So IMHO we can also use your scratch file proposal as a setting, but only for the input document. > Overlay a pdf file which is 750 pages ends up in OutOfMemoryError > ----------------------------------------------------------------- > > Key: PDFBOX-3852 > URL: https://issues.apache.org/jira/browse/PDFBOX-3852 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 2.0.6 > Environment: Unbuntu, jetty > Reporter: ryuukei > Labels: Overlay > Attachments: 750-pages.pdf, McAfee.pdf, Overlay.patch > > > We found an issue and solution to fix it, you guys might would be interested to have a look and see whether it is worth applying the attached patch to benefit more pdfbox users. :-) And a bit more detail this error happens based on jetty running time memory setting, and pdf file size. > * Application platform: > Unbuntu, jetty > * The test case to produce this issue: > Add simple overlay to all pages (in this case it is 750 pages). The processPages function eats up the JVM memories while applying the overlay to the file. > * sample code for using pdfbox overlay: > {code} > PDDocument document = PDDocument.load( pdf ); > HashMap overlayGuide = new HashMap(); > for (int i = 0; i < pagenunber; i++) > { > // "watermarked.pdf" meat to be a file which contains watermarks on the page > overlayGuide.put(i+1, "watermarked.pdf"); > } > Overlay overlay = new Overlay(); > overlay.setInputPDF( document ); > overlay.setOverlayPosition( Overlay.Position.FOREGROUND ); > PDDocument overlayResult = overlay.overlay( overlayGuide ); > {code} > * Error log: > {code} > INFO | jvm 1 | main | 2017/07/03 13:06:23 | java.lang.OutOfMemoryError: Java heap space > STATUS | wrapper | main | 2017/07/03 13:06:23 | Filter trigger matched. Restarting JVM. > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.io.ScratchFile.(ScratchFile.java:128) > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143) > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.cos.COSStream.(COSStream.java:55) > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.multipdf.Overlay.createStream(Overlay.java:***) > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.multipdf.Overlay.processPages(Overlay.java:364) > INFO | jvm 1 | main | 2017/07/03 13:06:23 | at org.apache.pdfbox.multipdf.Overlay.overlay(Overlay.java:128) > {code} > * Solution > Apply MemoryUsageSetting to Overlay, allows Overlay to use file as temp output. > * Update for the Overlay usage: > {code} > PDDocument document = PDDocument.load( pdf ); > HashMap overlayGuide = new HashMap(); > for (int i = 0; i < pagenunber; i++) > { > overlayGuide.put(i+1, "watermarked.pdf"); > } > Overlay overlay = new Overlay(); > overlay.setInputPDF( document ); > overlay.setOverlayPosition( Overlay.Position.FOREGROUND ); > // set overlay to use temp file as out rather than memory > MemoryUsageSetting memoryUsageSetting = MemoryUsageSetting.setupTempFileOnly( ); > memoryUsageSetting.setTempDir( new File ( "someTempWorkingDir" ) ); > overlay.setMemoryUsageSetting( memoryUsageSetting ); > PDDocument overlayResult = overlay.overlay( overlayGuide ); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org