pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Buxbaum <cbuxb...@bamboorose.com>
Subject Difficulty compressing images in a fop generated pdf
Date Mon, 30 Oct 2017 19:13:39 GMT
HI all,

I seem to be stumped with this.

I am taking as source a FOP generated PDF, and trying to compress the images.  I have this
bit of code that compresses the images for a page:


private static void getImagesFromResources(PDResources resources, PDDocument document, float
quality) throws IOException {

Iterator<COSName> objectNames = resources.getXObjectNames().iterator();

ArrayList <COSName> objectNamesArray = new ArrayList<COSName>();

while (objectNames.hasNext())

{

objectNamesArray.add(objectNames.next());

}

for (int i=0; i < objectNamesArray.size(); i++)

{

COSName xObjectName=objectNamesArray.get(i);

PDXObject xObject = resources.getXObject(xObjectName);


if (xObject instanceof PDFormXObject)

{

// skip this, not a use case we will encounter

}

else if (xObject instanceof PDImageXObject)

{

System.out.println("replacing Image");

PDImageXObject  imageObject = (PDImageXObject) xObject;

BufferedImage image = imageObject.getImage();

// writes the file with given compression level

// from your JPEGImageWriteParam instance

PDImageXObject newImageObject = JPEGFactory.createFromImage(document, image, quality);

resources.put(xObjectName, newImageObject);

}

}

}


Here is the snippet that calls that code:


File sourceFile = tempFile;

String fileName = FilenameUtils.getBaseName(tempFile.getName());

File destFile;

try

{

destFile = File.createTempFile(fileName, ".pdf", tempdir);

}

catch (IOException e2)

{

throw new ApplicationException("Could not create temporary file" , e2);

}

PDDocument document = null;

try

{

document = PDDocument.load(tempFile);

}

catch (InvalidPasswordException e)

{

throw new ApplicationException("Could not load input PDF file" , e);

}

catch (IOException e)

{

throw new ApplicationException("Could not load input PDF file" , e);

}

PDStream stream= new PDStream(document);

try

{

is = stream.createInputStream();

}

catch (IOException e1)

{

throw new ApplicationException("Could not load create input stream" , e1);

}

try

{

for (int i = 0; i < document.getNumberOfPages(); i++)

{

PDPage page = document.getPage(i);

try

{

PDFParser parser =

getImagesFromResources(page.getResources(),document, quality);

}

catch (IOException e)

{

throw new ApplicationException("Could not retrieve images from PDF file" , e);

}

}

…


I am passing in the resources associated with each page.  The problem seems to be that all
of the image resources appear on all pages, so I end up processing all of the images multiple
times.  Also, in the original document, it seems that, although all of the resources are also
present on all pages, the page somehow “knows” which ones to use and which to ignore.
 So in my processed document, when I add the new images to the resources, I end up bloating
the pdf with unnecessary images.

Is there a way to see if the page is actually using the image, and only processing it if it
is?  I tried finding matches on the page dictionary, and parsing the page cream and matching
on a dictionary there, to know avail.  I have used the debugger to see that the resources
are in each page although each page only displays one of the the images.

Thanks in advance for any advice/help.

Carl Buxbaum
Senior Software Architect
17 Rogers St
Gloucester, MA 01930
1-978-515-5128

[cid:image001.png@01D2DBB9.68CF1440]<https://www.bamboorose.com/>   [cid:image002.png@01D2DBB9.68CF1440]
<http://www.facebook.com/BambooRoseCommunity>    [cid:image003.png@01D2DBB9.68CF1440]
<https://www.linkedin.com/company/2814733>    [cid:image004.png@01D2DBB9.68CF1440] <https://twitter.com/GoBambooRose>
   [cid:image005.png@01D2DBB9.68CF1440] <https://www.youtube.com/channel/UCmVhcuiXr9JbN9H8DBZcNNg>
   [cid:image006.png@01D2DBB9.68CF1440] <https://www.bamboorose.com/blog/>

________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to
the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning
is recommended and is the responsibility of the recipient.

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message