pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Ye <yuanzhou...@gmail.com>
Subject Re: Problem with extracted JPEG images with RGB colorspace (from a PDF)
Date Tue, 08 Dec 2015 18:06:37 GMT
Thanks Tilman! That worked!

Any information when you'll have a new release with this fix in?

Regards,
Joe

On Mon, Dec 7, 2015 at 4:33 PM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Hi,
>
> The good news is that this bug was fixed last weekend in PDFBOX-3153. Get
> the latest trunk and see in ExtractImages.java or do this to get a jpeg
> stream:
>
> InputStream dctStream =
> img.createInputStream(Arrays.asList(COSName.DCT_DECODE.getName()));
>
>
>
> Tilman
>
>
> Am 07.12.2015 um 13:54 schrieb Joe Ye:
>
>> Hi,
>>
>>
>> We've been using PDFBox to extract images from PDF files and recently
>> upgraded to PDFBox version 2.0.0-RC2. I noticed that class PDXObjectImage
>> is renamed/rewritten and method PDXObjectImage.write2OutputStream we used
>> to write images to disk no longer exists?
>>
>>
>>
>> Therefore, I've been trying to use the new class PDImageXObject and follow
>> your example org.apache.pdfbox.tools.ExtractImages#write2file in order to
>> extract images from PDF and write them to disk. It appears that there's a
>> code path (IOUtils.copy etc) for RGB or Gray colorspace where it just
>> copies the unmodified JPEG stream. However, I have a couple of JPEG images
>> with RBG colorspace in a PDF and used this code to extract and write them
>> to disk, and they can't be opened by any image viewer, suggesting that the
>> images may be damaged…
>>
>>
>>
>> If I change the code to call ImageIOUtil.writeImage instead, then the
>> extracted images can be viewed ok. But I don't know the implication here
>> as
>> the code suggests that the JPEG will be converted.
>>
>>
>>
>> Please could you suggest why IOUtils.copy for RGB or Gray did not work
>> properly and what's the recommended/ correct way to process them?
>>
>>
>> Kind regards,
>>
>> Joe
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message