pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhnupy Gonzalez <zhn...@gmail.com>
Subject remove all images from pdf
Date Thu, 08 Nov 2012 13:49:21 GMT
Hello everyone,
While looking for a way to remove all images from pdf file, I found this:
http://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java

which wasn't enough, so I ended replacing the page's resource object with a
new (empty) one:
for (Object pageObj : catalog.getAllPages()) {
    PDPage page = (PDPage) pageObj;
    page.setResources(new PDResources());
}

which for my purposes is fine (there are some warnings when opening the
file with acrobat reader but it doesn't interfere with my goal).

BUT, there are still some images on the document and I don't know how to
tear them out. I don't even  know how to "navigate" to those images,  my
guess is I need to somehow traverse through page.getCOSDictionary()  and
delete appropiate entries but I haven't manage to do that and also not sure
if that works.

any help?
regards

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message