pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Sviridov <ooo_satu...@mail.ru>
Subject Re[6]: PDFRenderer, PDDocument memory issue
Date Wed, 01 Jul 2015 11:38:50 GMT
 The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE

I tried with load (fileName,true). The result - now I don't have memory problems. However
now I have 2 problems:

1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One thumbnail image
is loaded about 4 seconds! 

2) Besides, as you see thumbnail images are loaded in separate thread. While this thread is
running and I try to
get big image for main content using   BufferedImage bi=pdfRenderer.renderImageWithDPI(page,
300, ImageType.RGB); I get the following exception:

java.io.IOException: java.util.zip.DataFormatException: unknown compression method
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
    at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
    at org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
    at org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
    at org.apache.pdfbox.pdfparser.BaseParser.<init>(BaseParser.java:146)
    at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:78)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
    at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
    at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
    at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
    at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
  ....
    at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.DataFormatException: unknown compression method
    at java.util.zip.Inflater.inflateBytes(Native Method)
    at java.util.zip.Inflater.inflate(Inflater.java:259)
    at java.util.zip.Inflater.inflate(Inflater.java:280)
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
    ... 20 more

How to solve these problems?


Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler <andreas@lehmi.de>:
>
>
>> Alex Sviridov < ooo_saturn7@mail.ru > hat am 1. Juli 2015 um 13:09 geschrieben:
>> 
>> 
>>  I decided to show all the code. I also send the pdf file - some file from
>> internet I use for testing.
>The attachment didn't make it due to some restrictions to the mailing list.
>Please post a link to the origin source or another place where we can download
>the pdf in question.
>
>> 
>> Task task = new Task() {
>>     @Override protected Integer call() throws Exception {
>>         for (int i=0;i<model.getTotalPages();i++){
>>             System.out.println("Point a:"+i);
>>             WritableImage writableImage=model.getPageThumbImage(i);
>>             System.out.println("Point b:"+i);
>>             ImageView imageView=new ImageView(writableImage);
>>             System.out.println("Point c:"+i);
>>             Label label=new Label(Integer.toString(i+1));
>>             System.out.println("Point d:"+i);
>>             VBox vBox=new VBox(imageView,label);
>>             System.out.println("Point e:"+i);
>>             vBox.setAlignment(Pos.CENTER);
>>             vBox.setStyle("-fx-padding:5px 5px 5px
>> 5px;-fx-background-color:red");
>>             System.out.println("Point f:"+i);
>>             Platform.runLater(new Runnable() {
>>                 @Override
>>                 public void run() {
>>                      thumbFlowPane.getChildren().add(vBox);
>>                 }
>>             });
>>         }
>>         return null;
>>     }
>> };
>> new Thread(task).start();
>> 
>> And here is the tail of the output
>> ....
>> Point a:30
>> Point b:30
>> Point c:30
>> Point d:30
>> Point e:30
>> Point f:30
>> Point a:31
>> 
>> What is scratch file? Sorry, I don't understand you.
>
>PDFBox holds a lot of temporary data in the memory. To reduce the memory
>footprint one can choose to use a scratch file instead, so that some/most of
>that data will be hold in a file.
>
>To do so, simply use another load method, e.g. 
>
>load(File file, boolean useScratchFiles)
>> 
>> 
>> 
>> 
>> 
>> 
>> Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler < andreas@lehmi.de
>:
>> >
>> >
>> >> Alex Sviridov <  ooo_saturn7@mail.ru > hat am 1. Juli 2015 um 12:58
>> >> geschrieben:
>> >> 
>> >> 
>> >>  Thank you for answer. I tried pdfbox-app-2.0.0-20150630.220424-1464.jar
>> >> the
>> >> result is the same.
>> >> 
>> >> When I create images I add them to javafx FlowPane. However, the problem
is
>> >> not in images because I repeat - I get 400mb when I do
>> >> pdfDocument=null,pdfRenderer=null.
>> >> 
>> >> Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I
>> >> don't
>> >> have any problems with memory. 
>> >> 
>> >> I'm getting problem with memory when I run in for loop getPageThumbImage.
>> >> 
>> >> I am sure that the problem is in PdfBox. Please, help me.
>> >Maybe, but I'm not sure at all.
>> >
>> >Try to use the scratch file.
>> >
>> >> Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler <
 andreas@lehmi.de
>> >> >:
>> >> >
>> >> >
>> >> >> Alex Sviridov <  ooo_saturn7@mail.ru > hat am 1. Juli 2015
um 10:16
>> >> >> geschrieben:
>> >> >> 
>> >> >> 
>> >> >>  I want to display all page thumbnails. However I came across memory
>> >> >> size
>> >> >> problem with PDFRenderer or PDDocument - I don't know which one.

>> >> >> 
>> >> >> I have the following code:
>> >> >>    ....
>> >> >>     private PDDocument pdfDocument;
>> >> >>     
>> >> >>     private PDFRenderer pdfRenderer;
>> >> >> 
>> >> >>     public WritableImage getPageThumbImage(int page){
>> >> >>         WritableImage result=null;
>> >> >>         try {
>> >> >>             BufferedImage bi=pdfRenderer.renderImageWithDPI(page,
12,
>> >> >> ImageType.RGB);
>> >> >>             result=SwingFXUtils.toFXImage(bi, null);
>> >> >>         } catch (IOException ex) {
>> >> >>              ....
>> >> >>         }
>> >> >>         return result;
>> >> >>     }
>> >> >>  .....
>> >> >> The method getPageThumbImage I run in for loop for every page.I
set java
>> >> >> memory heap to 500mb. 
>> >> >> And I can get about 30 images using getPageThumbImage (if I set
more
>> >> >> memory
>> >> >> I
>> >> >> get more). 
>> >> >> In my application I have real time memory graphs and they show
that
>> >> >> memory
>> >> >> is
>> >> >> very fast filled. 
>> >> >> When there is no more free memory getPageThumbImage hangs - no
>> >> >> exception,
>> >> >> nothing. But the code stops.
>> >> >> When I do pdfDocument=null,pdfRenderer=null I get about 400mb free
>> >> >> memory.
>> >> >> How
>> >> >> to solve this problem?
>> >> >There are 2 possible issues and maybe both are relevant.
>> >> >
>> >> >1. PDFBox consumes more or less memory to load a pdf depending on the
size
>> >> >and
>> >> >the content of the pdf.
>> >> >
>> >> >- Are you using the latest 2.0.0-SNAPSHOT? There were some improvements
>> >> >concerning the memory footprint lately
>> >> >- Try to use of a scratch file (there are load methods including a boolean
>> >> >switcht ot activate that)
>> >> >
>> >> >2. Your own implementation consumes more or less memory to process those
>> >> >thumbnails
>> >> >
>> >> >- check if you are releasing all resources (ecspecially those images
>> >> >you're
>> >> >creating) you are using during your process
>> >> >
>> >> >HTH,
>> >> >Andreas
>> >> >
>> >> >---------------------------------------------------------------------
>> >> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
>> >> >For additional commands, e-mail:  users-help@pdfbox.apache.org
>> >> >
>> >> 
>> >> 
>> >> -- 
>> >> Alex Sviridov
>> >
>> >BR
>> >Andreas
>> >
>> >---------------------------------------------------------------------
>> >To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
>> >For additional commands, e-mail:  users-help@pdfbox.apache.org
>> >
>> 
>> 
>> -- 
>> Alex Sviridov
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail:  users-help@pdfbox.apache.org
>
>
>BR
>Andreas
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail:  users-unsubscribe@pdfbox.apache.org
>For additional commands, e-mail:  users-help@pdfbox.apache.org
>


-- 
Alex Sviridov
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message