pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Text removal
Date Mon, 23 Mar 2015 15:13:27 GMT
Dear a7mad,

removing text from a PDF is not an easy task as
- text which might visually appear as a single item might consistent of individual parts within
the PDF itself e.g. each character or groups of characters are place individually in different
- text might be drawn using graphics commands
- text can appear within different parts of the PDF (e.g. the text might be content of a form
field AND the annotation representing the form field visually)
- you need to look up the encoding information to get form the characters in the PDF "string"
to the ones you are looking for

If you can post a specific PDF to a public location and describe in detail which string should
have been replaced which hasn't I will be able to tell you why that might have happened.


> Am 23.03.2015 um 15:03 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
> Hi all,
> Currently I am facing a strange problem removing text from the some PDFs.
> My program is able to find the text and "remove it" by calling the
> COSString.reset() method.
> The problem is, when I open the output PDF file, I still see the text but
> not selectable (I mean when I try to highlight it with the mouse to copy
> it, it's not selectable!). When print the content (tokens) of the output
> file, I DO NOT find the text at all!!
> I am currently stuck in the PDF specifications 1.5 and really running out
> of time.
> I'd so much appreciate any help or any idea on what's going on.
> Notes:
> 1. I use use PDFBox 1.7.1
> 2. This problem does not occur with all PDFs, only some PDFs cause this
> problem.
> Thank you very much.
> a7mad

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message