Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21698C895 for ; Mon, 3 Jun 2013 13:01:53 +0000 (UTC) Received: (qmail 31309 invoked by uid 500); 3 Jun 2013 13:01:52 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 31220 invoked by uid 500); 3 Jun 2013 13:01:51 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 31064 invoked by uid 99); 3 Jun 2013 13:01:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 13:01:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gilad.denneboom@gmail.com designates 209.85.214.50 as permitted sender) Received: from [209.85.214.50] (HELO mail-bk0-f50.google.com) (209.85.214.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 13:01:44 +0000 Received: by mail-bk0-f50.google.com with SMTP id ik5so1912843bkc.37 for ; Mon, 03 Jun 2013 06:01:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=TZFhdD2Lal1nfr8yZhg21yJQcwH6yFF/zJBgfFQ1R9o=; b=XDkprjAOtE0Zz/h9i/QALZhhMHOoTRiOx5a2wCJHui8QIjNEWntKh/uanW5mjXvCAh 6Ug4vyvh9wraSdfFLdagCQyF/DJeUCXgFnlPxQo3U2Yp6hFOnsmKgYDs4laZuZmD9Vcp k6s/nysbLY6Zca1VYt9zRFRfUpSh3XD9hSscX8JxBuX1OHC0rctCwRaozvu5YX2FHgU0 s3SCwiQybVSebIDSbhQGf5vgxjHlyLMQfYl/m8q+TsU6h/+PnIAD2ZQt9j1ReEMgOmFZ 6a9yJ1h15yTlpXZyh+b+98CHyorpG5dZHBf9Sy1LuLHtgdpzO6nWCZ+/do1mur5VLhU5 iPlg== X-Received: by 10.204.78.12 with SMTP id i12mr6492442bkk.105.1370264483554; Mon, 03 Jun 2013 06:01:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.36.4 with HTTP; Mon, 3 Jun 2013 06:01:03 -0700 (PDT) In-Reply-To: <1370263810.4980.YahooMailNeo@web120503.mail.ne1.yahoo.com> References: <1370263810.4980.YahooMailNeo@web120503.mail.ne1.yahoo.com> From: Gilad Denneboom Date: Mon, 3 Jun 2013 15:01:03 +0200 Message-ID: Subject: Re: Merging a lot of small pdf documents (1/2 pages) into one pdf document To: "users@pdfbox.apache.org" , mihaela olteanu Content-Type: multipart/alternative; boundary=047d7bdc0f52ca962d04de3f8f25 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc0f52ca962d04de3f8f25 Content-Type: text/plain; charset=ISO-8859-1 Try loading the file using a scratch file: http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDDocument.html#load(java.lang.String,%20org.apache.pdfbox.io.RandomAccess) This will help lessen the memory load. On Mon, Jun 3, 2013 at 2:50 PM, mihaela olteanu wrote: > Hello, > > I have a use case where I need to merge a large number of small pdf > document (hundred of thousands) into one pdf document. > Currently I am using the > method: org.apache.pdfbox.util.PDFMergerUtility.appendDocument(destination, > source); for all the source documents, not directly mergeDocuments() method > in the same class because I need to also add some bookmarks. Finally I save > the document. > > Is it a better way of doing this with a lower memory footprint? I tried > importing each page from the source documents by using the method > PDDocument.importPage() but still throws errors in version 1.8.2. > > When I call PDDocument.load(File) the whole document is loaded in memory? > If so, it means that saving the generated pdf after merging a subset of > documents and then reloading it would not decrease the memory use anyway ... > > Could somebody point me to the right way of doing this? > > Thanks, > Mihaela --047d7bdc0f52ca962d04de3f8f25--