pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: AW: Splitter.createNewDocument() always uses main memory only - this leads to out of memory when splitting large documents
Date Fri, 14 Jul 2017 09:17:40 GMT
You are looking at the wrong place. pdfbox-app is just a meta project to create a convience
binary of all relevant subprojects. It doesn't contain any source code.

The source code you are looking for is here:

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/2.0.7-SNAPSHOT/

Andreas

> D.Hamann@aurenz.de hat am 14. Juli 2017 um 11:05 geschrieben:
> 
> 
> Hi,
> 
> I talking about the snapshot versions provided here:
> 
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> 
> Can you tell me were to download jars containing source files? The source jars there
just contain the META-INF directory but nothing else.
> 
> Thank you!
> 
> -----Ursprüngliche Nachricht-----
> Von: Gilad Denneboom [mailto:gilad.denneboom@gmail.com] 
> Gesendet: Freitag, 14. Juli 2017 11:03
> An: users@pdfbox.apache.org
> Betreff: Re: Splitter.createNewDocument() always uses main memory only - this leads to
out of memory when splitting large documents
> 
> You don't need a decompiler... PDFBox is an open-source library. All the code is available
online.
> 
> On Fri, Jul 14, 2017 at 10:39 AM, <D.Hamann@aurenz.de> wrote:
> 
> > Hi Tilman,
> >
> > I used a decompiler to have a look at the sources.
> >
> > Perhaps it would be a good idea to set Splitter() deprecated
> >
> >             @deprecated
> >             public Splitter() {}
> >
> >             public Splitter(MemoryUsageSetting memoryUsageSetting) {
> >                 this.memoryUsageSetting = memoryUsageSetting;
> >             }
> >
> >
> > to point people to the improvement before they fall into the out of 
> > memory hole themselves.
> >
> >
> > Please add a program argument to PDFSplit.split() like so:
> >
> >            if (args[i].equals("-memory")) {
> >                 if (++i >= args.length) {
> >                     PDFSplit.usage();
> >                 }
> >                 if (args[i].equals("tempFile")) {
> >                           memoryUsageSetting = .........
> >                 } else if (args[i].equals("mainMemory")) {
> >                           memoryUsageSetting = .........
> >                 } else if (args[i].equals("mixed")) {
> >                           memoryUsageSetting = .........
> >                 } else {
> >                       PDFSplit.usage();
> >                 }
> >                 continue;
> >             }
> >
> > Perhaps it would be a good idea to even make "maxMainMemoryBytes" and 
> > "maxStorageBytes" configurable, too.
> >
> > Thanks a lot - I really appreciate your great work and support!
> >
> > Cheers,
> >
> > Daniel
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Tilman Hausherr [mailto:THausherr@t-online.de]
> > Gesendet: Donnerstag, 13. Juli 2017 21:21
> > An: users@pdfbox.apache.org
> > Betreff: Re: Splitter.createNewDocument() always uses main memory only 
> > - this leads to out of memory when splitting large documents
> >
> > See
> > https://issues.apache.org/jira/browse/PDFBOX-3869
> >
> > and try a snapshot from
> > https://repository.apache.org/content/groups/snapshots/org/
> > apache/pdfbox/pdfbox-app/2.0.7-SNAPSHOT/
> > (at the bottom)
> >
> > Please give feedback whether this is what you wanted. Please do it 
> > quickly because a new version will be built on monday so either I'd 
> > have to revert before or we'll be stuck with this API.
> >
> > Re: a global configuration - maybe at a later time. I'm not THAT 
> > convinced that it is needed.
> >
> > Tilman
> >
> >
> > Am 13.07.2017 um 09:20 schrieb D.Hamann@aurenz.de:
> > > Hi dear contributors to pdfbox,
> > >
> > > I just would like to report that Splitter.createNewDocument() should 
> > > be
> > able to consider different MemoryUsageSetting configurations.
> > >
> > > In version 2.0.6 this method is implemented as
> > >
> > >
> > > protected PDDocument createNewDocument() throws IOException
> > >      {
> > >          PDDocument document = new PDDocument();
> > >          document.getDocument().setVersion(getSourceDocument()
> > .getVersion());
> > >          document.setDocumentInformation(getSourceDocument().
> > getDocumentInformation());
> > >          document.getDocumentCatalog().setViewerPreferences(
> > >                  getSourceDocument().getDocumentCatalog().
> > getViewerPreferences());
> > >          return document;
> > >      }
> > >
> > >
> > >
> > > I would suggest to introduce a member variable "MemoryUsageSetting
> > memSetting" that can be set for each instance of "Splitter".
> > >
> > > This way createNewDocument() could be implemented as
> > >
> > >
> > > protected PDDocument createNewDocument() throws IOException
> > >      {
> > >          PDDocument document = new PDDocument(this. memSetting);
> > >          document.getDocument().setVersion(getSourceDocument()
> > .getVersion());
> > >          document.setDocumentInformation(getSourceDocument().
> > getDocumentInformation());
> > >          document.getDocumentCatalog().setViewerPreferences(
> > >                  getSourceDocument().getDocumentCatalog().
> > getViewerPreferences());
> > >          return document;
> > >      }
> > >
> > >
> > > Thankfully createNewDocument() is not private, so I could override 
> > > this method in my child class (as I did for "protected void 
> > > processPage()", too... (just FYI - to create process messages)
> > >
> > >
> > > Please have a look at "PDFMergerUtility.mergeDocuments()" which is
> > deprecated since MemoryUsageSetting was introduced. Now, the usage of 
> > "PDFMergerUtility.mergeDocuments(MemoryUsageSetting memUsageSetting)" 
> > is encouraged.
> > >
> > >
> > > By the way: The utility "PDFSplit" would have to be updated to pass 
> > > a
> > configured MemoryUsageSetting to "Splitter" - otherwise this tool 
> > relies on main memory only.
> > >
> > > Perhaps it would be a good thing to be able to define a "pdfbox-wide"
> > > basic MemoryUsageSetting which could be used everywhere as a fallback.
> > > This way the default constructor of PDDocument could be changed from
> > >
> > > its implementation in version 2.0.6
> > >
> > > public PDDocument()
> > >      {
> > >          this(MemoryUsageSetting.setupMainMemoryOnly());
> > >      }
> > >
> > >
> > > to something like
> > >
> > >
> > > public PDDocument()
> > >      {
> > >          this(MemoryUsageSetting.asConfigured());
> > >      }
> > >
> > >
> > >
> > > Regards,
> > >
> > > Daniel
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > > For additional commands, e-mail: users-help@pdfbox.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message