pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Date Thu, 13 Jul 2017 19:21:21 GMT
See
https://issues.apache.org/jira/browse/PDFBOX-3869

and try a snapshot from
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
(at the bottom)

Please give feedback whether this is what you wanted. Please do it 
quickly because a new version will be built on monday so either I'd have 
to revert before or we'll be stuck with this API.

Re: a global configuration - maybe at a later time. I'm not THAT 
convinced that it is needed.

Tilman


Am 13.07.2017 um 09:20 schrieb D.Hamann@aurenz.de:
> Hi dear contributors to pdfbox,
>
> I just would like to report that Splitter.createNewDocument() should be able to consider
different MemoryUsageSetting configurations.
>
> In version 2.0.6 this method is implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>      {
>          PDDocument document = new PDDocument();
>          document.getDocument().setVersion(getSourceDocument().getVersion());
>          document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>          document.getDocumentCatalog().setViewerPreferences(
>                  getSourceDocument().getDocumentCatalog().getViewerPreferences());
>          return document;
>      }
>
>
>
> I would suggest to introduce a member variable "MemoryUsageSetting memSetting" that can
be set for each instance of "Splitter".
>
> This way createNewDocument() could be implemented as
>
>
> protected PDDocument createNewDocument() throws IOException
>      {
>          PDDocument document = new PDDocument(this. memSetting);
>          document.getDocument().setVersion(getSourceDocument().getVersion());
>          document.setDocumentInformation(getSourceDocument().getDocumentInformation());
>          document.getDocumentCatalog().setViewerPreferences(
>                  getSourceDocument().getDocumentCatalog().getViewerPreferences());
>          return document;
>      }
>
>
> Thankfully createNewDocument() is not private, so I could override this method in my
child class (as I did for "protected void processPage()", too... (just FYI - to create process
messages)
>
>
> Please have a look at "PDFMergerUtility.mergeDocuments()" which is deprecated since MemoryUsageSetting
was introduced. Now, the usage of "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)"
is encouraged.
>
>
> By the way: The utility "PDFSplit" would have to be updated to pass a configured MemoryUsageSetting
to "Splitter" - otherwise this tool relies on main memory only.
>
> Perhaps it would be a good thing to be able to define a "pdfbox-wide" basic MemoryUsageSetting
which could be used everywhere as a fallback. This way the default constructor of PDDocument
could be changed from
>
> its implementation in version 2.0.6
>
> public PDDocument()
>      {
>          this(MemoryUsageSetting.setupMainMemoryOnly());
>      }
>
>
> to something like
>
>
> public PDDocument()
>      {
>          this(MemoryUsageSetting.asConfigured());
>      }
>
>
>
> Regards,
>
> Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message