pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Re: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Date Fri, 14 Jul 2017 09:02:31 GMT
You don't need a decompiler... PDFBox is an open-source library. All the
code is available online.

On Fri, Jul 14, 2017 at 10:39 AM, <D.Hamann@aurenz.de> wrote:

> Hi Tilman,
>
> I used a decompiler to have a look at the sources.
>
> Perhaps it would be a good idea to set Splitter() deprecated
>
>             @deprecated
>             public Splitter() {}
>
>             public Splitter(MemoryUsageSetting memoryUsageSetting) {
>                 this.memoryUsageSetting = memoryUsageSetting;
>             }
>
>
> to point people to the improvement before they fall into the out of memory
> hole themselves.
>
>
> Please add a program argument to PDFSplit.split() like so:
>
>            if (args[i].equals("-memory")) {
>                 if (++i >= args.length) {
>                     PDFSplit.usage();
>                 }
>                 if (args[i].equals("tempFile")) {
>                           memoryUsageSetting = .........
>                 } else if (args[i].equals("mainMemory")) {
>                           memoryUsageSetting = .........
>                 } else if (args[i].equals("mixed")) {
>                           memoryUsageSetting = .........
>                 } else {
>                       PDFSplit.usage();
>                 }
>                 continue;
>             }
>
> Perhaps it would be a good idea to even make "maxMainMemoryBytes" and
> "maxStorageBytes" configurable, too.
>
> Thanks a lot - I really appreciate your great work and support!
>
> Cheers,
>
> Daniel
>
>
> -----Urspr√ľngliche Nachricht-----
> Von: Tilman Hausherr [mailto:THausherr@t-online.de]
> Gesendet: Donnerstag, 13. Juli 2017 21:21
> An: users@pdfbox.apache.org
> Betreff: Re: Splitter.createNewDocument() always uses main memory only -
> this leads to out of memory when splitting large documents
>
> See
> https://issues.apache.org/jira/browse/PDFBOX-3869
>
> and try a snapshot from
> https://repository.apache.org/content/groups/snapshots/org/
> apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> (at the bottom)
>
> Please give feedback whether this is what you wanted. Please do it quickly
> because a new version will be built on monday so either I'd have to revert
> before or we'll be stuck with this API.
>
> Re: a global configuration - maybe at a later time. I'm not THAT convinced
> that it is needed.
>
> Tilman
>
>
> Am 13.07.2017 um 09:20 schrieb D.Hamann@aurenz.de:
> > Hi dear contributors to pdfbox,
> >
> > I just would like to report that Splitter.createNewDocument() should be
> able to consider different MemoryUsageSetting configurations.
> >
> > In version 2.0.6 this method is implemented as
> >
> >
> > protected PDDocument createNewDocument() throws IOException
> >      {
> >          PDDocument document = new PDDocument();
> >          document.getDocument().setVersion(getSourceDocument()
> .getVersion());
> >          document.setDocumentInformation(getSourceDocument().
> getDocumentInformation());
> >          document.getDocumentCatalog().setViewerPreferences(
> >                  getSourceDocument().getDocumentCatalog().
> getViewerPreferences());
> >          return document;
> >      }
> >
> >
> >
> > I would suggest to introduce a member variable "MemoryUsageSetting
> memSetting" that can be set for each instance of "Splitter".
> >
> > This way createNewDocument() could be implemented as
> >
> >
> > protected PDDocument createNewDocument() throws IOException
> >      {
> >          PDDocument document = new PDDocument(this. memSetting);
> >          document.getDocument().setVersion(getSourceDocument()
> .getVersion());
> >          document.setDocumentInformation(getSourceDocument().
> getDocumentInformation());
> >          document.getDocumentCatalog().setViewerPreferences(
> >                  getSourceDocument().getDocumentCatalog().
> getViewerPreferences());
> >          return document;
> >      }
> >
> >
> > Thankfully createNewDocument() is not private, so I could override
> > this method in my child class (as I did for "protected void
> > processPage()", too... (just FYI - to create process messages)
> >
> >
> > Please have a look at "PDFMergerUtility.mergeDocuments()" which is
> deprecated since MemoryUsageSetting was introduced. Now, the usage of
> "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" is
> encouraged.
> >
> >
> > By the way: The utility "PDFSplit" would have to be updated to pass a
> configured MemoryUsageSetting to "Splitter" - otherwise this tool relies on
> main memory only.
> >
> > Perhaps it would be a good thing to be able to define a "pdfbox-wide"
> > basic MemoryUsageSetting which could be used everywhere as a fallback.
> > This way the default constructor of PDDocument could be changed from
> >
> > its implementation in version 2.0.6
> >
> > public PDDocument()
> >      {
> >          this(MemoryUsageSetting.setupMainMemoryOnly());
> >      }
> >
> >
> > to something like
> >
> >
> > public PDDocument()
> >      {
> >          this(MemoryUsageSetting.asConfigured());
> >      }
> >
> >
> >
> > Regards,
> >
> > Daniel
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message