pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Questions about Streaming PDFs and Form fields
Date Sat, 18 Jan 2014 12:36:27 GMT
Hi Tom,

it might be better to handle the forms filling yourself as importing FDF is another parsing
effort which is not needed if you have the data in you app already.

BR

Maruan Sahyoun

Am 17.01.2014 um 18:11 schrieb Tom Kesling <kesling@gmail.com>:

> Thank you for the feedback.
> 
> I've tried using the loadNonSeq approach combined with importing FDF into
> the acroform.
> When I do this I see what looks like too much memory being used.
> Memory goes up 30MB on this call for a simple 13k PDF.
> 
> The memory usage goes up on the call acroForm.importFDF(fdfDocument)
> 
> Code Snippet:
> // load pdf
> pdf = PDDocument.loadNonSeq(pdfFile, pdfScratchFile);
> 
> // load xfdf
> fdf = FDFDocument.loadXFDF(args[1]);
> 
> // get acroForm
> docCatalog = pdfDocument.getDocumentCatalog();
> acroForm = docCatalog.getAcroForm();
> acroForm.setCacheFields(false);
> 
> // Import FDF
> acroForm.importFDF(fdfDocument);
> 
> I get the impression this call requires the entire document to be loaded
> into memory.
> Is there a way to conserve memory and import the FDF?
> Should I avoid the call to importFDF and approach this differently?  ie:
> manually populating the acroForm?
> 
> Let me know your thoughts.
> 
> Thanks,
> T
> 
> 
> 
> 
> On Fri, Jan 17, 2014 at 2:12 AM, Maruan Sahyoun <sahyoun@fileaffairs.de>wrote:
> 
>> Hi Tom,
>> 
>> PDF is not a format which is build sequentially but a Random Access
>> format. In order to lower the memory consumption you can pass a temp file
>> which will be used to store intermediate data.
>> 
>> Take a look at
>> http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/pdmodel/PDDocument.htmlespecially
the load and loadNonSeq (which is the preferred method)
>> description
>> 
>> PDFStreamParser is used internally to parse PDF streams (a PDF internal
>> structure).
>> 
>> BR
>> 
>> Maruan Sahyoun
>> 
>> Am 17.01.2014 um 04:39 schrieb Tom Kesling <kesling@gmail.com>:
>> 
>>> Hello,
>>> I would like to ask a few questions about Streaming with PDFBox.
>>> 
>>> I use the term Streaming for the lack of a better term.  My code will
>>> execute in a JEE Container so I need to conserve memory as much as
>>> possible.
>>> 
>>> Goals:
>>> I want to be able to set form fields in a PDF without loading the PDF
>> into
>>> memory.
>>> I would like to stream in the PDF and set the fields as they are
>>> encountered.
>>> A new PDF will be streamed to disk with the populated form fields.
>>> 
>>> I would also like to be able to read form fields from a PDF without
>> loading
>>> it into memory.
>>> I would like to to stream in the PDF and read the fields as they are
>>> encountered.
>>> 
>>> I've messed around with the PDFStreamingParser but I haven't figured out
>>> how to locate form fields.
>>> 
>>> If anyone can give me any guidance or examples of how to do this that
>> would
>>> help alot.
>>> 
>>> Any help is appreciated.
>>> 
>>> Thanks,
>>> T
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message