pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Help with removing images from a PDF
Date Tue, 16 Oct 2012 06:10:44 GMT
hi,


Am 15.10.2012 03:56, schrieb Nicholas Tiong:
> Hi Andreas,
>
> I've commented out the 'do' line, but still cannot get rid of the images.
>
> I've basically opened the document and loaded the resources and then saved
> the document. See code below.
>
> This seems to be insufficient. Do I need to parse the PDF stream somehow?
Ups, I guess there was a misunderstanding. My idea won't work if you want to 
remove the images permanently. But I have another one, see below

> Regards,
> Nicholas Tiong
>
> import org.apache.pdfbox.exceptions.COSVisitorException;
> import org.apache.pdfbox.exceptions.CryptographyException;
> import org.apache.pdfbox.exceptions.InvalidPasswordException;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.PDResources;
> import org.apache.pdfbox.resources.*;
> import java.io.IOException;
>
> public class ExtractImages {
>      public static void main(String[] argv) throws COSVisitorException,
> InvalidPasswordException, CryptographyException, IOException {
>          PDDocument document = PDDocument.load("input.pdf");
>
>          if (document.isEncrypted()) {
>              document.decrypt("");
>          }
>
>          PDDocumentCatalog catalog = document.getDocumentCatalog();
>          for (Object pageObj :  catalog.getAllPages()) {
>              PDPage page = (PDPage) pageObj;
>              PDResources resources = page.findResources();

You have to remove all images from the dictionary. I neither test nor compile 
that code, but it should make clear how it works.


	COSDictionary dictResources = resources.getCOSDictionary();
	HashMap<String,PDXObjectImage> images = resources.getImages();
	Iterator<String> iter = images.keySet().iterator();
  	while( iter.hastNext() )
  	{
		dictResources.removeItem(COSName.getPDFName(iter.next()));
	}

>
>
>          }
>
>          document.save("strippedOfImages.pdf");
>      }
> }
>
>
> SNIP

We should probably add a removeItem-method to the PDResources class.

BR
Andreas Lehmkühler

Mime
View raw message