pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itai Shaked (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4392) PDF completely blow up the RAM on amazon instances
Date Mon, 03 Dec 2018 12:30:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707121#comment-16707121

Itai Shaked commented on PDFBOX-4392:

I started looking at the warning message, and the potential performance impact of copying
and re-parsing the entire ICC profile in case it is Perceptual, and I came across something
peculiar, which may be a bug. 

In line 220 of the file PDICCBased.java the following test is used to check whether the profile
is perceptual:  


{{  if (profileData[ICC_Profile.icHdrRenderingIntent] == ICC_Profile.icPerceptual)  }}

Where icHdrRenderingIntent has the value 64 and icPerceptual is 0. 

The ICC format specification ([http://www.color.org/specification/ICC1v43_2010-12.pdf)] has
a table in page 19 describing the format of the header, in which the field Rendering Intent
is indeed in bytes 64-67 of the header. In page 23 however, where Rendering Intent is described,
it says "The field is a uInt32Number in which the least-significant 16 bits shall be used
to encode the rendering intent. The most significant 16 bits shall be set to zero (0000h)."
.  Since the entire format is Big-Endian, this to me means the value to be checked should
actually be in index 67 (profileData[ICC_Profile.icHdrRenderingIntent + 3]), and the current
test will always return true - regardless of the rendering intent.  

If this is indeed the case, PDFBox may be wrongfully changing profiles, which may impact both
performance and accuracy.  

Or am I misunderstanding the specification? 

> PDF completely blow up the RAM on amazon instances
> --------------------------------------------------
>                 Key: PDFBOX-4392
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4392
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.12
>            Reporter: Oleksandr Skoryi
>            Priority: Major
>             Fix For: 2.0.13
>         Attachments: 2f0f8f77-7a85-416d-b5d2-47a07d1416d4_3.pdf
> Hi all
> The issue is pretty straightforward. I receive a lot of pdfs every day and render them.
In most of the cases everything is OK, but PDFs which produces 
> WARN org.apache.pdfbox.pdmodel.graphics.color.PDICCBased - ICC profile is Perceptual,
ignoring, treating as Display class
> working super long, and are super memory consumable. 
> It takes from 5 to 15 min on m5.large amazon instance. But attached PDF completely killed
the instance. The java process is just killed by linux during processing with no exception
in logs. 
> So could you please provide explanations what is going on with files with WARN message
above, and how can I improve the rendering. 
> Here is my VM options 
> -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true -Xmx3G -Xms2G -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider"
> Also don't hesitate to ask me about more PDF, I have tones of them :D
> And also a question, does GPU have influence on rendering?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message