pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timo Boehme (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
Date Tue, 11 Sep 2018 10:38:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610415#comment-16610415

Timo Boehme commented on PDFBOX-4309:

I've found the reason why the 'direct-draw' solution is slower than mine and also is much
slower on other pages of the problematic document (e.g. 9 seconds vs. 0.2 seconds): in PDICCBased.loadICCProfile()
some operations are performed to trigger exceptions in order to fall back to alternate color
space. The trigger awtColorSpace.toRGB() results (in my environment) in a 0.4 second delay
- it seems internally it also uses the slow color-convert operation.

I wanted to check if an alternative operation without this side-effect could be used, however
I found no document to trigger the exception (in my environment). In the code there are following
references to problematic documents:
 * PDFBOX-1295: triggers an exception but with trigger 'ComponentColorModel', not the 'toRGB'
 * PDFBOX-1740: same as PDFBOX-1295
 * PDFBOX-3610: no exception

Thus its not clear to me if the trigger 'toRGB' is still needed. At least I would like to
have a switch to disable this trigger so that the trigger by default is 'on' for compatibility.
For PDFBOX version 3.x we could maybe remove it - in case we don't find any documents the
trigger is good for.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> -----------------------------------------------------------
>                 Key: PDFBOX-4309
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4309
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Rendering
>    Affects Versions: 2.0.11, 3.0.0 PDFBox
>            Reporter: Timo Boehme
>            Assignee: Timo Boehme
>            Priority: Minor
>              Labels: optimization
>         Attachments: PDColorSpace.java.patch, PDICCBased.java.patch
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are graphics
produced by CorelDraw which are combined by more than 2500(!) images, each with its own indexed
color space based on an ICC color space (the shadows of graphic objects are created by large
number of gray lines ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux
64Bit) rendering a single page with one graphic takes 780 seconds. The most time is spent
in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>         at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>         at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>         at sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>         - locked <0x0000000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>         at sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>         at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>         at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.<init>(PDIndexed.java:91)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>         at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>         at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>         at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>         at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>         at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>         at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking way to much
time. Unfortunately using kcms via {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}}
is also no option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) -
in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as alternate color
space and for CMYK we have the alternative rendering via system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion
from PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space instead
of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as alternative color space
it has to be combined with the system property 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic graphic drops
from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact colors.
Therefore it should be only an option to enable this mode. However for processing large collections
of PDF documents (e.g. focusing on text) or to display a PDF in a timely manner the performance
improvement should outperform the drop in image quality.
> While the provided patch will use the alternate color space if activated in any case,
it could be possible at a later stage to add more intelligent logic which decides on a runtime
analysis when to use this mode (number of calls to LCMS, time needed etc.).
> If there are no objections with this patch I will apply it in the next days.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message