pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Issues with MRC Compressed using JBIG2-image
Date Tue, 08 Nov 2016 18:10:15 GMT
Hello Erik,

I've identified the problem and created issue 
https://issues.apache.org/jira/browse/PDFBOX-3559 where it has been 
fixed. The cause was a "fast path" for jpeg files that ignored the mask. 
Please try again with a snapshot build when it is there.

https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.4-SNAPSHOT/

I tested myself, the output does not look very good but this could be 
because IrfanView misidentifies the ARGB file as a CMYK jpeg (which it 
isn't) or because java doesn't save it properly. Maybe it is different 
with another software.

Tilman



Am 08.11.2016 um 09:22 schrieb Zeiske, Erik (DualStudy):
> I've used the Extract Images Command Line Tool to get the images.
>
> Erik
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Dienstag, 8. November 2016 09:16
> To: users@pdfbox.apache.org
> Subject: Re: Issues with MRC Compressed using JBIG2-image
>
> What methods did you use to get the images?
>
> What I did is to look at the rendering and it looks like in Adobe Reader.
>
> I also looked at the images with PDFDebugger, that one shows the images with the mask
applied. The second image is at
> Root/Pages/Kids/[0]/Resources/XObject/Im002
> and it shows colored text. The image is DCT encoded. The mask is black and white text
that is jbig2 encoded.
> http://imgur.com/a/2ofjD
>
> What do you get?
>
> Is the jbig2 decoder in your class path? For PDFDebugger, you need to do
> this:
>
> java -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider -cp "pdfbox-app-XXXX.jar;lib/*"
org.apache.pdfbox.tools.PDFBox PDFReader filename
>
> the subdir "lib" has the additional jars.
>
> Tilman
>
> Am 08.11.2016 um 08:31 schrieb Zeiske, Erik (DualStudy):
>> Hello Tilman,
>>
>> You solved the NPE but there is something else wrong with the outputted images. In
the PDF there are 3 images an 2 masks for two of those images. (The PDF is compressed like
it is shown here: https://www.abbyy.com/en-us/ocr-sdk-embedded/pdf-mrc/. The Foreground is
the second image of the PDF and uses the JBIG2 image as a mask to get the coloured text. The
third image and its mask is for the watermark of the PDF and is extracted perfectly fine.)
The library doesn't apply the mask correctly to the second image. The resulting image should
be only the Text with its colour. But the result is only the colour without the mask applied.
>> I hope this makes sense.
>>
>> Erik.
>>
>> -----Original Message-----
>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>> Sent: Montag, 7. November 2016 18:27
>> To: users@pdfbox.apache.org
>> Subject: Re: Issues with MRC Compressed using JBIG2-immage
>>
>> Hello Erik,
>>
>> I've opened
>> https://issues.apache.org/jira/browse/PDFBOX-3558
>> and fixed the cause for the NPE in the sources. I have not fully understood your
text or maybe misunderstood something, and maybe something is now moot; can you please test
with a snapshot that the rendering is like you want it? The build will be there within a few
hours.
>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfb
>> ox/pdfbox-app/2.0.4-SNAPSHOT/
>>
>> Tilman
>>
>> Am 07.11.2016 um 08:06 schrieb Zeiske, Erik (DualStudy):
>>> Here is a Dropbox link to download the PDF:
>>> https://www.dropbox.com/s/q1t58ov6vybu3k7/scan300_1-6.pdf?dl=0
>>> I am using version 2.0.3 of PDF-Box
>>>
>>> -----Original Message-----
>>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>>> Sent: Donnerstag, 3. November 2016 18:07
>>> To: users@pdfbox.apache.org
>>> Subject: Re: Issues with MRC Compressed using JBIG2-immage
>>>
>>> Am 03.11.2016 um 09:58 schrieb Zeiske, Erik (DualStudy):
>>>> Hello everybody,
>>>>
>>>> I have an issue with PDFBox and the handling of a MRC Compressed PDF.
>>>>
>>>> The issue is related to the JBIG2 Compression used in the PDF. If I
>>>> try to extract the different Images used in the PDF attached, the
>>>> library throws an NullPointerException cause the Bits are not
>>>> defined in the JBIG2-Filter. I think this is because in the PDF
>>>> there is no "Bits per Component" defined in the JBIG2-Immage. If I
>>>> try to define the Bits in the JAVA-Code the program runs without an
>>>> error, but it doesn't apply the JBIG2 mask properly to the
>>>> foreground-colour-image of the PDF. To fix this issue I tried to
>>>> extract the mask into a file, but it seems like the mask-image is the same
as the foreground-image.
>>>> I couldn't find the reason for this and I don't think it is related
>>>> to the PDF itself.
>>>>
>>>> The PDF I was using with is in the attached to this e-mail.
>>>>
>>> Please upload the file to a sharehoster, PDF attachments are not
>>> allowed. Please tell also what version you are using and what
>>>
>>> Tilman
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message