From dev-return-58154-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Tue Sep 11 12:38:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 233CE18065B for ; Tue, 11 Sep 2018 12:38:03 +0200 (CEST) Received: (qmail 82142 invoked by uid 500); 11 Sep 2018 10:38:03 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 82131 invoked by uid 99); 11 Sep 2018 10:38:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2018 10:38:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CC3C81A06EB for ; Tue, 11 Sep 2018 10:38:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.801 X-Spam-Level: X-Spam-Status: No, score=-109.801 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 56f68bx3Qrns for ; Tue, 11 Sep 2018 10:38:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 574495F431 for ; Tue, 11 Sep 2018 10:38:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 989D6E02F1 for ; Tue, 11 Sep 2018 10:38:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 131372183F for ; Tue, 11 Sep 2018 10:38:00 +0000 (UTC) Date: Tue, 11 Sep 2018 10:38:00 +0000 (UTC) From: "Timo Boehme (JIRA)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-4309?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D166= 10415#comment-16610415 ]=20 Timo Boehme commented on PDFBOX-4309: ------------------------------------- I've found the reason why the 'direct-draw' solution is slower than mine an= d also is much slower on other pages of the problematic document (e.g. 9 se= conds vs. 0.2 seconds): in PDICCBased.loadICCProfile() some operations are = performed to trigger exceptions in order to fall back to alternate color sp= ace. The trigger awtColorSpace.toRGB() results (in my environment) in a 0.4= second delay - it seems internally it also uses the slow color-convert ope= ration. I wanted to check if an alternative operation without this side-effect coul= d be used, however I found no document to trigger the exception (in my envi= ronment). In the code there are following references to problematic documen= ts: * PDFBOX-1295: triggers an exception but with trigger 'ComponentColorModel= ', not the 'toRGB' * PDFBOX-1740: same as PDFBOX-1295 * PDFBOX-3610: no exception Thus its not clear to me if the trigger 'toRGB' is still needed. At least I= would like to have a switch to disable this trigger so that the trigger by= default is 'on' for compatibility. For PDFBOX version 3.x we could maybe r= emove it - in case we don't find any documents the trigger is good for. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > ----------------------------------------------------------- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering > Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme > Assignee: Timo Boehme > Priority: Minor > Labels: optimization > Attachments: PDColorSpace.java.patch, PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there = are graphics produced by CorelDraw which are combined by more than 2500(!) = images, each with its own indexed color space based on an ICC color space (= the shadows of graphic objects are created by large number of gray lines ..= .). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit)= rendering a single page with one graphic takes 780 seconds. The most time = is spent in creating the indexed color space via ICC color space mapping: > {noformat} > =C2=A0=C2=A0 java.lang.Thread.State: RUNNABLE > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.java2d.cmm.lcms.LCMS.cr= eateNativeTransform(Native Method) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.java2d.cmm.lcms.LCMS.cr= eateTransform(LCMS.java:156) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.java2d.cmm.lcms.LCMSTra= nsform.doTransform(LCMSTransform.java:155) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - locked <0x0000000723af9e30> = (a sun.java2d.cmm.lcms.LCMSTransform) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.java2d.cmm.lcms.LCMSTra= nsform.colorConvert(LCMSTransform.java:268) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.awt.image.ColorConvert= Op.ICCBIFilter(ColorConvertOp.java:355) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.awt.image.ColorConvert= Op.filter(ColorConvertOp.java:282) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDIndexed.(PDIndexed.java:91) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.create(PDColorSpace.java:184) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.create(PDColorSpace.java:70) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.create(PDColorSpace.java:92) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.color.PDColorSpace.create(PDColorSpace.java:70) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.pdmodel.g= raphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.pdfbox.rendering= .PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here tak= ing way to much time. Unfortunately using kcms via {{-Dsun.java2d.cmm=3Dsun= .java2d.cmm.kcms.KcmsServiceProvider}} is also no option as the Suse IceadT= ea OpenJDK seems to not have included it (anymore?) - in both Java 7 and Ja= va 8. > However the ICC color space (PDICCBased) returns in this case CMYK as alt= ernate color space and for CMYK we have the alternative rendering via syste= m property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from PDFBO= X-3569. > The idea is now to have an option to force using the alternative color sp= ace instead of the ICC one to circumvent using LCMS in toRGBImage(). For CM= YK as alternative color space it has to be combined with the system propert= y 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic g= raphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not e= xact colors. Therefore it should be only an option to enable this mode. How= ever for processing large collections of PDF documents (e.g. focusing on te= xt) or to display a PDF in a timely manner the performance improvement shou= ld outperform the drop in image quality. > While the provided patch will use the alternate color space if activated = in any case, it could be possible at a later stage to add more intelligent = logic which decides on a runtime analysis when to use this mode (number of = calls to LCMS, time needed etc.). > If there are no objections with this patch I will apply it in the next da= ys. > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org