pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeiske, Erik (DualStudy)" <erik.zei...@hpe.com>
Subject RE: Issues with MRC Compressed using JBIG2-image
Date Tue, 08 Nov 2016 08:22:50 GMT
I've used the Extract Images Command Line Tool to get the images.

Erik

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Dienstag, 8. November 2016 09:16
To: users@pdfbox.apache.org
Subject: Re: Issues with MRC Compressed using JBIG2-image

What methods did you use to get the images?

What I did is to look at the rendering and it looks like in Adobe Reader.

I also looked at the images with PDFDebugger, that one shows the images with the mask applied.
The second image is at
Root/Pages/Kids/[0]/Resources/XObject/Im002
and it shows colored text. The image is DCT encoded. The mask is black and white text that
is jbig2 encoded.
http://imgur.com/a/2ofjD

What do you get?

Is the jbig2 decoder in your class path? For PDFDebugger, you need to do
this:

java -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider -cp "pdfbox-app-XXXX.jar;lib/*"
org.apache.pdfbox.tools.PDFBox PDFReader filename

the subdir "lib" has the additional jars.

Tilman

Am 08.11.2016 um 08:31 schrieb Zeiske, Erik (DualStudy):
> Hello Tilman,
>
> You solved the NPE but there is something else wrong with the outputted images. In the
PDF there are 3 images an 2 masks for two of those images. (The PDF is compressed like it
is shown here: https://www.abbyy.com/en-us/ocr-sdk-embedded/pdf-mrc/. The Foreground is the
second image of the PDF and uses the JBIG2 image as a mask to get the coloured text. The third
image and its mask is for the watermark of the PDF and is extracted perfectly fine.) The library
doesn't apply the mask correctly to the second image. The resulting image should be only the
Text with its colour. But the result is only the colour without the mask applied.
> I hope this makes sense.
>
> Erik.
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Montag, 7. November 2016 18:27
> To: users@pdfbox.apache.org
> Subject: Re: Issues with MRC Compressed using JBIG2-immage
>
> Hello Erik,
>
> I've opened
> https://issues.apache.org/jira/browse/PDFBOX-3558
> and fixed the cause for the NPE in the sources. I have not fully understood your text
or maybe misunderstood something, and maybe something is now moot; can you please test with
a snapshot that the rendering is like you want it? The build will be there within a few hours.
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfb
> ox/pdfbox-app/2.0.4-SNAPSHOT/
>
> Tilman
>
> Am 07.11.2016 um 08:06 schrieb Zeiske, Erik (DualStudy):
>> Here is a Dropbox link to download the PDF:
>> https://www.dropbox.com/s/q1t58ov6vybu3k7/scan300_1-6.pdf?dl=0
>> I am using version 2.0.3 of PDF-Box
>>
>> -----Original Message-----
>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>> Sent: Donnerstag, 3. November 2016 18:07
>> To: users@pdfbox.apache.org
>> Subject: Re: Issues with MRC Compressed using JBIG2-immage
>>
>> Am 03.11.2016 um 09:58 schrieb Zeiske, Erik (DualStudy):
>>> Hello everybody,
>>>
>>> I have an issue with PDFBox and the handling of a MRC Compressed PDF.
>>>
>>> The issue is related to the JBIG2 Compression used in the PDF. If I 
>>> try to extract the different Images used in the PDF attached, the 
>>> library throws an NullPointerException cause the Bits are not 
>>> defined in the JBIG2-Filter. I think this is because in the PDF 
>>> there is no "Bits per Component" defined in the JBIG2-Immage. If I 
>>> try to define the Bits in the JAVA-Code the program runs without an 
>>> error, but it doesn't apply the JBIG2 mask properly to the 
>>> foreground-colour-image of the PDF. To fix this issue I tried to 
>>> extract the mask into a file, but it seems like the mask-image is the same as
the foreground-image.
>>> I couldn't find the reason for this and I don't think it is related 
>>> to the PDF itself.
>>>
>>> The PDF I was using with is in the attached to this e-mail.
>>>
>> Please upload the file to a sharehoster, PDF attachments are not 
>> allowed. Please tell also what version you are using and what
>>
>> Tilman
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message