pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Raw Image extraction - possible in PDFBox 1.x - impossible in 2.x
Date Thu, 27 Nov 2014 17:41:10 GMT
Hi,

It is still possible :-) Look in the ExtractImages example:


                 else if ("jpg".equals(suffix))
                 {
                     String colorSpaceName = 
pdImage.getColorSpace().getName();
                     if (directJPEG || 
PDDeviceGray.INSTANCE.getName().equals(colorSpaceName) ||
PDDeviceRGB.INSTANCE.getName().equals(colorSpaceName))
                     {
                         // RGB or Gray colorspace: get and write the 
unmodifiedJPEG stream
                         InputStream data = 
pdImage.getStream().getPartiallyFilteredStream(JPEG);
                         IOUtils.copy(data, out);
                         IOUtils.closeQuietly(data);
                     }
                     else
                     {
                         // for CMYK and other "unusual" colorspaces, 
the JPEG will be converted
                         ImageIOUtil.writeImage(image, suffix, out);
                     }
                 }



Tilman

Am 27.11.2014 um 18:28 schrieb Jaroslav Půbal:
> Hello,
> i need extract RAW image from PDF. The image is EXIF tagged.
>   
> In PDFBox 1.x it was done with
>    PDXObjectImage.write2OutputStream(os);
>   
> In PDFBox 2.0.0 image extraction can be done with
>    ImageIO.write((PDImageXObject)image.getImage(), "jpg", os);
> but this is not RAW data from PDF, this is complete image reencode, so EXIF is lost .
>   
> How to extract RAW image in PDFBox 2.0.0 ?
>   
> Thanks


Mime
View raw message