pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dustin T.. Clifford" <dcliff...@sitesoftllc.net>
Subject Problems converting PDF and extracting Images
Date Mon, 21 Dec 2009 04:09:56 GMT

I'm new to PDFBox and I'm having some trouble converting the pages in a PDF to JPEG images.
The issue seems to be more than similar to the linked thread.


I have used ImageMagick CLI tools to ensure that the PDF is not the issue and everything renders
just fine. Below, I have linked two images of the same page (one created by image magick CLI
and one by PDFBox).

 http://www.sitesoftllc.net/images/imagemagick.jpg    (ImageMagick Created)
 http://www.sitesoftllc.net/images/pdfbox.jpg         (PDFBox Created)

My next attempt to determine if I was doing something wrong was to extract the images in the
PDF. Those appear to be discolored as well (I have linked the result below). 

 http://www.sitesoftllc.net/images/extractedimage.jpg (Extracted Image)

Has anybody experienced this? Is there a solution? Please, find my code copied below. I appreciate
any help in finding out what I'm doing wrong.

package com.sitesoftllc.pdf;

import java.awt.image.BufferedImage;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;

import sun.awt.image.codec.JPEGImageEncoderImpl;

import com.sun.image.codec.jpeg.JPEGCodec;
import com.sun.image.codec.jpeg.JPEGImageEncoder;

public class PdfBoxPdfReader {

	public static void main(String[] args){
			BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("files/january.pdf")));
			PDDocument doc = PDDocument.load(bis);
			List pages = (List)doc.getDocumentCatalog().getAllPages();

			int pageCount = doc.getNumberOfPages();


			int pageNumber = 0;
			Iterator it = pages.iterator();
				String imageType = "jpg";
				String fileName = "files/pages/page-"+pageNumber+"."+imageType;
				PDPage thisPage = (PDPage)it.next();
				BufferedImage image = thisPage.convertToImage();
				FileOutputStream pageFos = new FileOutputStream(fileName);
				PDResources resources = thisPage.getResources();
				Map images = resources.getImages();
				Iterator imageIt = images.keySet().iterator();
					String imageKey = (String) imageIt.next();
					PDXObjectImage pdfImage = (PDXObjectImage)images.get(imageKey);
					BufferedImage bImage = pdfImage.getRGBImage();
					FileOutputStream imageFos = new FileOutputStream(new File("files/extracted/"+imageKey+".jpg"));
					JPEGImageEncoder jpgEncoder = JPEGCodec.createJPEGEncoder(imageFos);
				JPEGImageEncoder jpgEncoder = JPEGCodec.createJPEGEncoder(pageFos);
		} catch (Exception e){

	private static JPEGImageEncoderImpl JPEGImageEncoderImpl() {
		// TODO Auto-generated method stub
		return null;


Dustin Clifford 
SiteSoft L.L.C. 
8063 20th. St. 
Jenison, MI 49428 

e. dclifford@sitesoftllc.net 
p. (616) 901-8693 
f. (616) 667-9622 

View raw message