pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a7med shre3y <a7med.shr...@gmail.com>
Subject Re: Text removal
Date Mon, 23 Mar 2015 21:01:24 GMT
Hi Maruan,

Here's a link from where you can download the PDF.

https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrODZnRzdn/view?usp=sharing

Kind Regards,
a7mad

On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
wrote:

> Hi,
>
> you need to upload it to a public location as the mailing list doesn't
> support attachments.
>
> BR
> Maruan
>
> > Am 23.03.2015 um 19:18 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
> >
> > Dear Maruan,
> >
> > Thank you very much for the information. Please find herewith attached
> the PDF to reproduce the problem.
> > The text to remove is: "To Be Approved". The text has a multi-byte
> encoding, so I call first to encode it in order to find it then remove it.
> >
> > Best Regards,
> > a7mad
> >
> >> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
> wrote:
> >> Dear a7mad,
> >>
> >> removing text from a PDF is not an easy task as
> >> - text which might visually appear as a single item might consistent of
> individual parts within the PDF itself e.g. each character or groups of
> characters are place individually in different COSStrings
> >> - text might be drawn using graphics commands
> >> - text can appear within different parts of the PDF (e.g. the text
> might be content of a form field AND the annotation representing the form
> field visually)
> >> - you need to look up the encoding information to get form the
> characters in the PDF "string" to the ones you are looking for
> >> ….
> >>
> >> If you can post a specific PDF to a public location and describe in
> detail which string should have been replaced which hasn't I will be able
> to tell you why that might have happened.
> >>
> >> Maruan
> >>
> >>
> >> > Am 23.03.2015 um 15:03 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
> >> >
> >> > Hi all,
> >> >
> >> > Currently I am facing a strange problem removing text from the some
> PDFs.
> >> > My program is able to find the text and "remove it" by calling the
> >> > COSString.reset() method.
> >> > The problem is, when I open the output PDF file, I still see the text
> but
> >> > not selectable (I mean when I try to highlight it with the mouse to
> copy
> >> > it, it's not selectable!). When print the content (tokens) of the
> output
> >> > file, I DO NOT find the text at all!!
> >> >
> >> > I am currently stuck in the PDF specifications 1.5 and really running
> out
> >> > of time.
> >> >
> >> > I'd so much appreciate any help or any idea on what's going on.
> >> >
> >> > Notes:
> >> > 1. I use use PDFBox 1.7.1
> >> > 2. This problem does not occur with all PDFs, only some PDFs cause
> this
> >> > problem.
> >> >
> >> > Thank you very much.
> >> > a7mad
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message