pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhnupy Gonzalez <zhn...@gmail.com>
Subject Re: remove all images from pdf
Date Thu, 08 Nov 2012 17:04:58 GMT
Juraj,
your answer made me put more attention not in streams but in decoders
instead. So I learn there is this class PageDrawer (actually a subclass of
PDFStreamEngine) that ultimately produces image from pdf, so I tried
commenting out a few decoders in PageDrawer.properties until I succeded:
only when I removed org.apache.pdfbox.util.operator.pagedrawer.LineTo the
icons I was looking to skip were gone.

thanks!


On Thu, Nov 8, 2012 at 8:00 AM, <jlonc@gi-bon.sk> wrote:

> hi,
> there are several types of pictures.
>
> 1. bitmap images that are stored as resources
> 2. inline bitmap images that are stored within page's content stream
> 3. images that consist of curves/vectors - these vectors are defined
> within page's content stream
>
> your example code removes only images defined in #1
> if you want to remove images #2 and #3 it is much harder. you need to
> parse content stream, remove them, and create new content stream.
>
>
> Best regards
> Juraj Lonc
>
>
>
>
> From:   Zhnupy Gonzalez <zhnupy@gmail.com>
> To:     users@pdfbox.apache.org,
> Date:   08. 11. 2012 14:50
> Subject:        remove all images from pdf
>
>
>
> Hello everyone,
> While looking for a way to remove all images from pdf file, I found this:
>
> http://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java
>
>
> which wasn't enough, so I ended replacing the page's resource object with
> a
> new (empty) one:
> for (Object pageObj : catalog.getAllPages()) {
>     PDPage page = (PDPage) pageObj;
>     page.setResources(new PDResources());
> }
>
> which for my purposes is fine (there are some warnings when opening the
> file with acrobat reader but it doesn't interfere with my goal).
>
> BUT, there are still some images on the document and I don't know how to
> tear them out. I don't even  know how to "navigate" to those images,  my
> guess is I need to somehow traverse through page.getCOSDictionary()  and
> delete appropiate entries but I haven't manage to do that and also not
> sure
> if that works.
>
> any help?
> regards
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message