pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Logan <john.lo...@texture.com>
Subject RE: Question about a feature
Date Thu, 10 Jan 2019 16:30:05 GMT
Hi Dorian,

I'd suggest starting with the RemoveAllText.java example to see the basic pattern for filtering
items from the PDF token stream.

What should work is to adapt this example to remove the "Do" operator and operands where
the corresponding PDXObject is an instance of PDImageXObject.

This will remove raster images but if you've got line art on the page, that will remain.


-----Original message-----
From: Dorian Messina
Sent: Thursday, January 10 2019, 5:41 am
To: users@pdfbox.apache.org
Subject: Question about a feature
First : thank you for PDFBox and all the time you pass working on it, to make our dev lives

I use for the first time the library and I have one < how to > question.

I need to remove all pictures from a selectable pdf (I can select the text with the mouse).
Solutions exist on stackoverflow https://stackoverflow.com/questions/6831194/how-can-i-remove-all-images-drawings-from-a-pdf-file-and-leave-text-only-in-java
and elsewhere but the code is old and refers to nonexistent
methods nowadays. Indeed, I am not able to find this miraculous method :


Does this feature still exist ? Is there a simple way to fullfill my objective ?

Thank you

Happy new year

Dorian Messina
Mobile : +32 493 02 63 57
d.messina@wavenet.be <mailto:d.messina@wavenet.be> <mailto:d.messina@wavenet.be <mailto:d.messina@wavenet.be>

Rue de l'artisanat, 16
7900 Leuze-en-Hainaut | Belgique
Tel : +32 69 67 03 35
www.wavenet.be <http://www.wavenet.be> <http://www.wavenet.be <http://www.wavenet.be>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message