pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Re[8]: PDFRenderer, PDDocument memory issue
Date Wed, 01 Jul 2015 12:08:30 GMT


> Alex Sviridov <ooo_saturn7@mail.ru> hat am 1. Juli 2015 um 13:59 geschrieben:
> 
> 
>  Ok. Thank you very much for explanation. Could you say where this scratch
> file is located linux/windows?
java.io.File.createTempFile is used to create that file. It uses the default
temp directory. It's "/tmp" on linux. I'm not sure for windows as different
environment variables (TMP, TEMP, USERPROFILE, ....) are used to search for such
a directory.

You may define your own temp directory using the following parameter when
starting your application

-Djava.io.tmpdir=PATH-TO-YOUR-TEMP


> 
> 
> Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler <andreas@lehmi.de>:
> >> Alex Sviridov < ooo_saturn7@mail.ru > hat am 1. Juli 2015 um 13:38
> >> geschrieben:
> >> 
> >> 
> >>  The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
> >Ah, that explains a lot. The pdf is a scanned document, every page holds a
> >color
> >image, consuming a lot of memory when processed
> >
> >> I tried with load (fileName,true). The result - now I don't have memory
> >> problems. However now I have 2 problems:
> >>
> >> 1) All the thumbnail images are loaded. However, the speed is VERY SLOW.
> >> One
> >> thumbnail image is loaded about 4 seconds! 
> >If it comes to huge pdfs, you have to die one death. Either you provide
> >enough
> >memory to do all the stuff in memory (fast) or you use a scratch file to save
> >memory (slow)
> >
> >And yes, there is room for an improvement of the memory handling (read on
> >demand, remove after usage) in PDFBox, but that is some future feature.
> >Patches
> >are welcome.
> >
> >> 2) Besides, as you see thumbnail images are loaded in separate thread.
> >> While
> >> this thread is running and I try to
> >> get big image for main content using   BufferedImage
> >> bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
> >> following exception:
> >> 
> >> java.io.IOException: java.util.zip.DataFormatException: unknown compression
> >> method
> >>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
> >>     at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
> >>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
> >>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
> >>     at
> >> org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
> >>     at
> >> org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
> >>     at org.apache.pdfbox.pdfparser.BaseParser.<init>(BaseParser.java:146)
> >>     at
> >> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:78)
> >>     at
> >> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
> >>     at
> >> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
> >>     at
> >> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
> >>     at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
> >>     at
> >> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
> >>     at
> >> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
> >>     at
> >> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
> >>   ....
> >>     at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
> >>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>     at java.lang.Thread.run(Thread.java:745)
> >> Caused by: java.util.zip.DataFormatException: unknown compression method
> >>     at java.util.zip.Inflater.inflateBytes(Native Method)
> >>     at java.util.zip.Inflater.inflate(Inflater.java:259)
> >>     at java.util.zip.Inflater.inflate(Inflater.java:280)
> >>     at
> >> org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
> >>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
> >>     ... 20 more
> >> 
> >> How to solve these problems?
> >PDFBox isn't supposed to be thread safe.
> >
> >> 
> >> 
> >> Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler < andreas@lehmi.de
> >> >:
> >> >
> >> >
> >> >> Alex Sviridov <  ooo_saturn7@mail.ru > hat am 1. Juli 2015 um
13:09
> >> >> geschrieben:
> >> >> 
> >> >> 
> >> >>  I decided to show all the code. I also send the pdf file - some file
> >> >> from
> >> >> internet I use for testing.
> >> >The attachment didn't make it due to some restrictions to the mailing
> >> >list.
> >> >Please post a link to the origin source or another place where we can
> >> >download
> >> >the pdf in question.
> >> >
> >> >> 
> >> >> Task task = new Task() {
> >> >>     @Override protected Integer call() throws Exception {
> >> >>         for (int i=0;i<model.getTotalPages();i++){
> >> >>             System.out.println("Point a:"+i);
> >> >>             WritableImage writableImage=model.getPageThumbImage(i);
> >> >>             System.out.println("Point b:"+i);
> >> >>             ImageView imageView=new ImageView(writableImage);
> >> >>             System.out.println("Point c:"+i);
> >> >>             Label label=new Label(Integer.toString(i+1));
> >> >>             System.out.println("Point d:"+i);
> >> >>             VBox vBox=new VBox(imageView,label);
> >> >>             System.out.println("Point e:"+i);
> >> >>             vBox.setAlignment(Pos.CENTER);
> >> >>             vBox.setStyle("-fx-padding:5px 5px 5px
> >> >> 5px;-fx-background-color:red");
> >> >>             System.out.println("Point f:"+i);
> >> >>             Platform.runLater(new Runnable() {
> >> >>                 @Override
> >> >>                 public void run() {
> >> >>                      thumbFlowPane.getChildren().add(vBox);
> >> >>                 }
> >> >>             });
> >> >>         }
> >> >>         return null;
> >> >>     }
> >> >> };
> >> >> new Thread(task).start();
> >> >> 
> >> >> And here is the tail of the output
> >> >> ....
> >> >> Point a:30
> >> >> Point b:30
> >> >> Point c:30
> >> >> Point d:30
> >> >> Point e:30
> >> >> Point f:30
> >> >> Point a:31
> >> >> 
> >> >> What is scratch file? Sorry, I don't understand you.
> >> >
> >> >PDFBox holds a lot of temporary data in the memory. To reduce the memory
> >> >footprint one can choose to use a scratch file instead, so that some/most
> >> >of
> >> >that data will be hold in a file.
> >> >
> >> >To do so, simply use another load method, e.g. 
> >> >
> >> >load(File file, boolean useScratchFiles)
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler
<
> >> >>  andreas@lehmi.de
> >> >> >:
> >> >> >
> >> >> >
> >> >> >> Alex Sviridov <  ooo_saturn7@mail.ru > hat am 1. Juli
2015 um 12:58
> >> >> >> geschrieben:
> >> >> >> 
> >> >> >> 
> >> >> >>  Thank you for answer. I tried
> >> >> >> pdfbox-app-2.0.0-20150630.220424-1464.jar
> >> >> >> the
> >> >> >> result is the same.
> >> >> >> 
> >> >> >> When I create images I add them to javafx FlowPane. However,
the
> >> >> >> problem
> >> >> >> is
> >> >> >> not in images because I repeat - I get 400mb when I do
> >> >> >> pdfDocument=null,pdfRenderer=null.
> >> >> >> 
> >> >> >> Bedised, when I do pdfDocument = PDDocument.load(new File(fileName))
> >> >> >> I
> >> >> >> don't
> >> >> >> have any problems with memory. 
> >> >> >> 
> >> >> >> I'm getting problem with memory when I run in for loop
> >> >> >> getPageThumbImage.
> >> >> >> 
> >> >> >> I am sure that the problem is in PdfBox. Please, help me.
> >> >> >Maybe, but I'm not sure at all.
> >> >> >
> >> >> >Try to use the scratch file.
> >> >> >
> >> >> >> Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler
<
> >> >> >>  andreas@lehmi.de
> >> >> >> >:
> >> >> >> >
> >> >> >> >
> >> >> >> >> Alex Sviridov <  ooo_saturn7@mail.ru > hat
am 1. Juli 2015 um
> >> >> >> >> 10:16
> >> >> >> >> geschrieben:
> >> >> >> >> 
> >> >> >> >> 
> >> >> >> >>  I want to display all page thumbnails. However I
came across
> >> >> >> >> memory
> >> >> >> >> size
> >> >> >> >> problem with PDFRenderer or PDDocument - I don't
know which one. 
> >> >> >> >> 
> >> >> >> >> I have the following code:
> >> >> >> >>    ....
> >> >> >> >>     private PDDocument pdfDocument;
> >> >> >> >>     
> >> >> >> >>     private PDFRenderer pdfRenderer;
> >> >> >> >> 
> >> >> >> >>     public WritableImage getPageThumbImage(int
page){
> >> >> >> >>         WritableImage result=null;
> >> >> >> >>         try {
> >> >> >> >>             BufferedImage bi=pdfRenderer.renderImageWithDPI(page,
> >> >> >> >> 12,
> >> >> >> >> ImageType.RGB);
> >> >> >> >>             result=SwingFXUtils.toFXImage(bi,
null);
> >> >> >> >>         } catch (IOException ex) {
> >> >> >> >>              ....
> >> >> >> >>         }
> >> >> >> >>         return result;
> >> >> >> >>     }
> >> >> >> >>  .....
> >> >> >> >> The method getPageThumbImage I run in for loop for
every page.I
> >> >> >> >> set
> >> >> >> >> java
> >> >> >> >> memory heap to 500mb. 
> >> >> >> >> And I can get about 30 images using getPageThumbImage
(if I set
> >> >> >> >> more
> >> >> >> >> memory
> >> >> >> >> I
> >> >> >> >> get more). 
> >> >> >> >> In my application I have real time memory graphs
and they show
> >> >> >> >> that
> >> >> >> >> memory
> >> >> >> >> is
> >> >> >> >> very fast filled. 
> >> >> >> >> When there is no more free memory getPageThumbImage
hangs - no
> >> >> >> >> exception,
> >> >> >> >> nothing. But the code stops.
> >> >> >> >> When I do pdfDocument=null,pdfRenderer=null I get
about 400mb free
> >> >> >> >> memory.
> >> >> >> >> How
> >> >> >> >> to solve this problem?
> >> >> >> >There are 2 possible issues and maybe both are relevant.
> >> >> >> >
> >> >> >> >1. PDFBox consumes more or less memory to load a pdf depending
on
> >> >> >> >the
> >> >> >> >size
> >> >> >> >and
> >> >> >> >the content of the pdf.
> >> >> >> >
> >> >> >> >- Are you using the latest 2.0.0-SNAPSHOT? There were
some
> >> >> >> >improvements
> >> >> >> >concerning the memory footprint lately
> >> >> >> >- Try to use of a scratch file (there are load methods
including a
> >> >> >> >boolean
> >> >> >> >switcht ot activate that)
> >> >> >> >
> >> >> >> >2. Your own implementation consumes more or less memory
to process
> >> >> >> >those
> >> >> >> >thumbnails
> >> >> >> >
> >> >> >> >- check if you are releasing all resources (ecspecially
those images
> >> >> >> >you're
> >> >> >> >creating) you are using during your process
> >> >> >> >
> >> >> >> >HTH,
> >> >> >> >Andreas
> >> >> >> >
> >> >> >> >---------------------------------------------------------------------
> >> >> >> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
> >> >> >> >For additional commands, e-mail:  users-help@pdfbox.apache.org
> >> >> >> >
> >> >> >> 
> >> >> >> 
> >> >> >> -- 
> >> >> >> Alex Sviridov
> >> >> >
> >> >> >BR
> >> >> >Andreas
> >> >> >
> >> >> >---------------------------------------------------------------------
> >> >> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
> >> >> >For additional commands, e-mail:  users-help@pdfbox.apache.org
> >> >> >
> >> >> 
> >> >> 
> >> >> -- 
> >> >> Alex Sviridov
> >> >> 
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
> >> >> For additional commands, e-mail:  users-help@pdfbox.apache.org
> >> >
> >> >
> >> >BR
> >> >Andreas
> >> >
> >> >---------------------------------------------------------------------
> >> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
> >> >For additional commands, e-mail:  users-help@pdfbox.apache.org
> >> >
> >> 
> >> 
> >> -- 
> >> Alex Sviridov
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
> >For additional commands, e-mail:  users-help@pdfbox.apache.org
> >
> 
> 
> -- 
> Alex Sviridov

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message