pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Kesling <kesl...@gmail.com>
Subject Re: Questions about Streaming PDFs and Form fields
Date Fri, 17 Jan 2014 17:11:52 GMT
Thank you for the feedback.

I've tried using the loadNonSeq approach combined with importing FDF into
the acroform.
When I do this I see what looks like too much memory being used.
Memory goes up 30MB on this call for a simple 13k PDF.

The memory usage goes up on the call acroForm.importFDF(fdfDocument)

Code Snippet:
// load pdf
pdf = PDDocument.loadNonSeq(pdfFile, pdfScratchFile);

// load xfdf
fdf = FDFDocument.loadXFDF(args[1]);

// get acroForm
docCatalog = pdfDocument.getDocumentCatalog();
acroForm = docCatalog.getAcroForm();
acroForm.setCacheFields(false);

// Import FDF
acroForm.importFDF(fdfDocument);

I get the impression this call requires the entire document to be loaded
into memory.
Is there a way to conserve memory and import the FDF?
Should I avoid the call to importFDF and approach this differently?  ie:
manually populating the acroForm?

Let me know your thoughts.

Thanks,
T




On Fri, Jan 17, 2014 at 2:12 AM, Maruan Sahyoun <sahyoun@fileaffairs.de>wrote:

> Hi Tom,
>
> PDF is not a format which is build sequentially but a Random Access
> format. In order to lower the memory consumption you can pass a temp file
> which will be used to store intermediate data.
>
> Take a look at
> http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/pdmodel/PDDocument.htmlespecially
the load and loadNonSeq (which is the preferred method)
> description
>
> PDFStreamParser is used internally to parse PDF streams (a PDF internal
> structure).
>
> BR
>
> Maruan Sahyoun
>
> Am 17.01.2014 um 04:39 schrieb Tom Kesling <kesling@gmail.com>:
>
> > Hello,
> > I would like to ask a few questions about Streaming with PDFBox.
> >
> > I use the term Streaming for the lack of a better term.  My code will
> > execute in a JEE Container so I need to conserve memory as much as
> > possible.
> >
> > Goals:
> > I want to be able to set form fields in a PDF without loading the PDF
> into
> > memory.
> > I would like to stream in the PDF and set the fields as they are
> > encountered.
> > A new PDF will be streamed to disk with the populated form fields.
> >
> > I would also like to be able to read form fields from a PDF without
> loading
> > it into memory.
> > I would like to to stream in the PDF and read the fields as they are
> > encountered.
> >
> > I've messed around with the PDFStreamingParser but I haven't figured out
> > how to locate form fields.
> >
> > If anyone can give me any guidance or examples of how to do this that
> would
> > help alot.
> >
> > Any help is appreciated.
> >
> > Thanks,
> > T
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message