pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hee Jeong Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4598) oversized jbig2 decoded result that causing unnecessary operation
Date Sat, 20 Jul 2019 23:52:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889609#comment-16889609

Hee Jeong Kim commented on PDFBOX-4598:

Thank you for your help!! (y)

> oversized jbig2 decoded result that causing unnecessary operation
> -----------------------------------------------------------------
>                 Key: PDFBOX-4598
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4598
>             Project: PDFBox
>          Issue Type: Bug
>          Components: JBIG2
>    Affects Versions: 3.0.2 JBIG2
>            Reporter: Hee Jeong Kim
>            Priority: Minor
>         Attachments: amb_2.jb2, approach_1.patch, approach_2.patch, approach_3.patch,
sample.pdf, use_packed_raster_to_read_Jbig2_image.patch
> Hi! I am using pdfbox 2.0.16 and jbig2-imageio 3.0.2 to read JBIG2 images, and found
some issue to report.
> It seems like jbig2-imageio creates oversized BufferedImage, and this also makes pdfbox
to do unnecessary operations.
> To read Jbig2 image, pdfbox with jbig2-imageio do followings:
> 1. find JBIG2 ImageReader (https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L67)
> 2. read Image and get BufferedImage as a result (https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L106)
> 2-1. JBIG2 ImageIO 3.0.2 get decoded bitmap (https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L249)
> 2-2. return the given bitmap as buffered image (https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/JBIG2ImageReader.java#L259)
> The problem is
> At step 2-1, roughly 59MB Bitmap is created for given Jbig2 image on the second page
of sample.pdf (which is correct),
> but oversize(473MB, roughly) BufferedImage is returned at the step 2-2.
> I think this is because jbig2-imageio uses a raster based on a PixelInterleavedSampleModel
and IndexColorModel with 8 bits.
> https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L177
> https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L286
> https://github.com/apache/pdfbox-jbig2/blob/3.0.2/src/main/java/org/apache/pdfbox/jbig2/image/Bitmaps.java#L291
> This also makes pdfbox to check a pixel size of the color model of result buffered image,
> https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L116
> and to create another BufferedImage with binary type since it is not 1. (jbig2 is 1-bit
> https://github.com/apache/pdfbox/blob/2.0.16/pdfbox/src/main/java/org/apache/pdfbox/filter/JBIG2Filter.java#L122
> I think we should call createPackedRaster and use the returned raster which is based
on MultiPixelPackedSampleModel, and use IndexColorModel with 1 bits since jbig2 is for bi-level
image. Please check the attached patch. I tested with the patch, and it seems like this patch
works well.
> You can reproduce this issue with the second of the sample.pdf file that I attached.
> You can also download the file from here: http://www.newsgn.com/data/newsgn_com/pdf/201802/2018022229524590.pdf

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message