Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2686B200BBD for ; Tue, 8 Nov 2016 19:09:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 254FA160B0A; Tue, 8 Nov 2016 18:09:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 447F3160AD0 for ; Tue, 8 Nov 2016 19:09:56 +0100 (CET) Received: (qmail 2973 invoked by uid 500); 8 Nov 2016 18:09:55 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 2961 invoked by uid 99); 8 Nov 2016 18:09:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 18:09:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6A450C0C69 for ; Tue, 8 Nov 2016 18:09:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.699 X-Spam-Level: X-Spam-Status: No, score=-0.699 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id V42itjmwt9Uj for ; Tue, 8 Nov 2016 18:09:51 +0000 (UTC) Received: from mailout06.t-online.de (mailout06.t-online.de [194.25.134.19]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2E8995F1BE for ; Tue, 8 Nov 2016 18:09:51 +0000 (UTC) Received: from fwd18.aul.t-online.de (fwd18.aul.t-online.de [172.20.26.244]) by mailout06.t-online.de (Postfix) with SMTP id 91BBE41C4F95 for ; Tue, 8 Nov 2016 19:09:44 +0100 (CET) Received: from [192.168.2.105] (SP4IHsZ1ZhhP0X3wrL0MbuaAvuYyOhq0Dyu4DAWmoFg9Tn90fi3AiQWI+1t9H2xg13@[217.231.147.75]) by fwd18.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-SHA encrypted) esmtp id 1c4Aq7-3oxl1U0; Tue, 8 Nov 2016 19:09:43 +0100 Subject: Re: Issues with MRC Compressed using JBIG2-image To: users@pdfbox.apache.org References: <39ef3cb48e63439eaebff4a57a219f8f@G4W9328.americas.hpqcorp.net> From: Tilman Hausherr Message-ID: <27ee4247-44c8-d200-c77d-d0ecda250a80@t-online.de> Date: Tue, 8 Nov 2016 19:10:15 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <39ef3cb48e63439eaebff4a57a219f8f@G4W9328.americas.hpqcorp.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-ID: SP4IHsZ1ZhhP0X3wrL0MbuaAvuYyOhq0Dyu4DAWmoFg9Tn90fi3AiQWI+1t9H2xg13 X-TOI-MSGID: f6537724-ca03-469c-b225-19c5194ab634 archived-at: Tue, 08 Nov 2016 18:09:57 -0000 Hello Erik, I've identified the problem and created issue https://issues.apache.org/jira/browse/PDFBOX-3559 where it has been fixed. The cause was a "fast path" for jpeg files that ignored the mask. Please try again with a snapshot build when it is there. https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.4-SNAPSHOT/ I tested myself, the output does not look very good but this could be because IrfanView misidentifies the ARGB file as a CMYK jpeg (which it isn't) or because java doesn't save it properly. Maybe it is different with another software. Tilman Am 08.11.2016 um 09:22 schrieb Zeiske, Erik (DualStudy): > I've used the Extract Images Command Line Tool to get the images. > > Erik > > -----Original Message----- > From: Tilman Hausherr [mailto:THausherr@t-online.de] > Sent: Dienstag, 8. November 2016 09:16 > To: users@pdfbox.apache.org > Subject: Re: Issues with MRC Compressed using JBIG2-image > > What methods did you use to get the images? > > What I did is to look at the rendering and it looks like in Adobe Reader. > > I also looked at the images with PDFDebugger, that one shows the images with the mask applied. The second image is at > Root/Pages/Kids/[0]/Resources/XObject/Im002 > and it shows colored text. The image is DCT encoded. The mask is black and white text that is jbig2 encoded. > http://imgur.com/a/2ofjD > > What do you get? > > Is the jbig2 decoder in your class path? For PDFDebugger, you need to do > this: > > java -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider -cp "pdfbox-app-XXXX.jar;lib/*" org.apache.pdfbox.tools.PDFBox PDFReader filename > > the subdir "lib" has the additional jars. > > Tilman > > Am 08.11.2016 um 08:31 schrieb Zeiske, Erik (DualStudy): >> Hello Tilman, >> >> You solved the NPE but there is something else wrong with the outputted images. In the PDF there are 3 images an 2 masks for two of those images. (The PDF is compressed like it is shown here: https://www.abbyy.com/en-us/ocr-sdk-embedded/pdf-mrc/. The Foreground is the second image of the PDF and uses the JBIG2 image as a mask to get the coloured text. The third image and its mask is for the watermark of the PDF and is extracted perfectly fine.) The library doesn't apply the mask correctly to the second image. The resulting image should be only the Text with its colour. But the result is only the colour without the mask applied. >> I hope this makes sense. >> >> Erik. >> >> -----Original Message----- >> From: Tilman Hausherr [mailto:THausherr@t-online.de] >> Sent: Montag, 7. November 2016 18:27 >> To: users@pdfbox.apache.org >> Subject: Re: Issues with MRC Compressed using JBIG2-immage >> >> Hello Erik, >> >> I've opened >> https://issues.apache.org/jira/browse/PDFBOX-3558 >> and fixed the cause for the NPE in the sources. I have not fully understood your text or maybe misunderstood something, and maybe something is now moot; can you please test with a snapshot that the rendering is like you want it? The build will be there within a few hours. >> https://repository.apache.org/content/groups/snapshots/org/apache/pdfb >> ox/pdfbox-app/2.0.4-SNAPSHOT/ >> >> Tilman >> >> Am 07.11.2016 um 08:06 schrieb Zeiske, Erik (DualStudy): >>> Here is a Dropbox link to download the PDF: >>> https://www.dropbox.com/s/q1t58ov6vybu3k7/scan300_1-6.pdf?dl=0 >>> I am using version 2.0.3 of PDF-Box >>> >>> -----Original Message----- >>> From: Tilman Hausherr [mailto:THausherr@t-online.de] >>> Sent: Donnerstag, 3. November 2016 18:07 >>> To: users@pdfbox.apache.org >>> Subject: Re: Issues with MRC Compressed using JBIG2-immage >>> >>> Am 03.11.2016 um 09:58 schrieb Zeiske, Erik (DualStudy): >>>> Hello everybody, >>>> >>>> I have an issue with PDFBox and the handling of a MRC Compressed PDF. >>>> >>>> The issue is related to the JBIG2 Compression used in the PDF. If I >>>> try to extract the different Images used in the PDF attached, the >>>> library throws an NullPointerException cause the Bits are not >>>> defined in the JBIG2-Filter. I think this is because in the PDF >>>> there is no "Bits per Component" defined in the JBIG2-Immage. If I >>>> try to define the Bits in the JAVA-Code the program runs without an >>>> error, but it doesn't apply the JBIG2 mask properly to the >>>> foreground-colour-image of the PDF. To fix this issue I tried to >>>> extract the mask into a file, but it seems like the mask-image is the same as the foreground-image. >>>> I couldn't find the reason for this and I don't think it is related >>>> to the PDF itself. >>>> >>>> The PDF I was using with is in the attached to this e-mail. >>>> >>> Please upload the file to a sharehoster, PDF attachments are not >>> allowed. Please tell also what version you are using and what >>> >>> Tilman >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>> For additional commands, e-mail: users-help@pdfbox.apache.org >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >> For additional commands, e-mail: users-help@pdfbox.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >> For additional commands, e-mail: users-help@pdfbox.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org > For additional commands, e-mail: users-help@pdfbox.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org