From dev-return-65939-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Sat Oct 12 13:41:04 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5B33C180608 for ; Sat, 12 Oct 2019 15:41:04 +0200 (CEST) Received: (qmail 33647 invoked by uid 500); 12 Oct 2019 13:41:03 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 33561 invoked by uid 99); 12 Oct 2019 13:41:03 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Oct 2019 13:41:03 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2022BE2DF9 for ; Sat, 12 Oct 2019 13:41:02 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 4F1D17805E9 for ; Sat, 12 Oct 2019 13:41:00 +0000 (UTC) Date: Sat, 12 Oct 2019 13:41:00 +0000 (UTC) From: "ASF subversion and git services (Jira)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-4341?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D169= 50034#comment-16950034 ]=20 ASF subversion and git services commented on PDFBOX-4341: --------------------------------------------------------- Commit 1868339 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1868339 ] PDFBOX-4341: fix matrix setter (it only stored 6 values of the 3x3 matrix),= by Emmeran Seehuber > [Patch] PNGConverter: PNG bytes to PDImageXObject converter > ----------------------------------------------------------- > > Key: PDFBOX-4341 > URL: https://issues.apache.org/jira/browse/PDFBOX-4341 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 2.0.12 > Reporter: Emmeran Seehuber > Priority: Minor > Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 0162= 89.png, 017012.png, 017030.png, 017063.png, 017084.png, image-2018-10-25-09= -29-47-251.png, optimized.zip, pngconvert_testimg.zip, pngconvert_v1.patch,= pngconvert_v2.patch, pngconvert_v3.patch > > > The attached patch implements a PNG bytes to PDImageXObject converter. It= tries to create a PDImageXObject from the chunks of a PNG image, without r= ecompressing it. This allows to use programs like pngcrush and friends to e= mbedded optimal compressed images. It=E2=80=99s also way faster than recomp= ressing the image. > The class PNGConverter does this in three steps: > - Parsing the PNG chunk structure from the byte array > - Validating all relevant data chunks (i.e. checking the CRC). Chunks wh= ich are not needed (e.g. text chunks) are not validated. > - Constructing a PDImageXObject from the chunks > When at any of this steps an error occurs or the converter detects that i= t is not possible to map the image, it will bail out and return null. In th= is case the image has to be embedded the =E2=80=9Enormal=E2=80=9C way by re= ading it using ImageIO and compressing it again. > Only this PNG image types can be converted (at least theoretically) witho= ut recompressing the image data: > - Grayscale > - Truecolor (i.e. RGB 8-Bit/16-Bit) > - Indexed > As soon as transparency is used it gets difficult: > - Grayscale with alpha / truecolor with alpha: The alpha channel is save= d in the image data stream, as they are stored as (Gray,Alpha) or (Red,Gree= n,Blue,Alpha) tuples. You have to separate the alpha information for the SM= ASK-Image. At this moment you can just read and recompress it using the Los= slessFactory. > - Indexed with alpha. Alpha and color tables are separate in the PNG, so= this should be possible to build a grayscale SMASK from the image data (wh= ich are just the table indices) and the alpha table. Tried that, but Acroba= t Reader does not like indexed SMASKs=E2=80=A6 One could just build a grays= cale SMASK using the alpha table and the decompressed image index data. Thi= s would at least save some space, as the optimized indexed image data is st= ill used. > With the current patch only truecolor without alpha images work correctly= . The other tests for grayscale and indexed fail. (You must place the zippe= d images in the resources folder were png.png resides to run the testdriver= s; This images are =E2=80=9Eoriginal=E2=80=9C work done by me using Gimp, K= rita and ImageOptim (on macOS) to build the different png image types.) > Notes for the current patch: > - The grayscale images have the wrong gamma curve. I tried using the Col= orSpace.CS_GRAY ICC profile and the image seems now only =E2=80=9Eslightly= =E2=80=9C off (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA ch= unk is given the image is tagged with a CalGray profile, but the colors are= way more off then. > - The cHRM (chroma) chunk is read and *should* work, as I used the formu= la=E2=80=99s from the PDF spec to convert the cRHM values to the CalRGB whi= tepoint and matrix. I have not yet tested this, as I have no test image wit= h cHRM at the moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fi= ne for geometric matrices. But this methods are wrong for any other kind of= matrix (i.e. color transform matrices), as they only store/restore 6 value= s of the 3x3 matrix. I deprecated PDCalRGB.setMatrix(Matrix) because of thi= s, as this was never working and can not work as long as the Matrix class i= s for geometric use cases only. This should also be documented on the Matri= x class, that it is not general purpose. I added a PDCalRGB.setMatrix(COSAr= ray) method to allow to set the matrix. > - The indexed image displays fine in Acrobat Reader, but the test driver= fails as PDImageXObject.getImage() returns a complete black (everything 0)= image. Strange, I suspect some error in the PDFBox image decoding. > - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is = attached. Theoretically you can use a CalRGB colorspace, but using a ICC co= lor profile is likely faster (at least in PDFBox) and more =E2=80=9Estandar= d=E2=80=9C. > You can also look at this patch on GitHub [https://github.com/apache/pdfb= ox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=3D1] if you like= . > It would be nice if someone could give me some hints with the colorspace = problems. I will try to reread the specs again, maybe I have missed somethi= ng. But it would be great if someone else who has an idea about colorspaces= could also take a look into this. > As I have no idea how long it takes to understand why the colors are off = for grayscale and wrong for indexed, I could prepare a stripped down versio= n of this patch, which only contains the working stuff (i.e. truecolor), an= d would just do nothing on the not working cases. What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org