pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Text removal
Date Mon, 23 Mar 2015 21:48:02 GMT
Hi,

your text is encoded so within the show text operator Tj the string is

7R %H $SSURYHG

You wrote that you encode your string to find it - what do you get?

BR
Maruan



> Am 23.03.2015 um 22:01 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
> 
> Hi Maruan,
> 
> Here's a link from where you can download the PDF.
> 
> https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrODZnRzdn/view?usp=sharing
> 
> Kind Regards,
> a7mad
> 
> On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
> wrote:
> 
>> Hi,
>> 
>> you need to upload it to a public location as the mailing list doesn't
>> support attachments.
>> 
>> BR
>> Maruan
>> 
>>> Am 23.03.2015 um 19:18 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
>>> 
>>> Dear Maruan,
>>> 
>>> Thank you very much for the information. Please find herewith attached
>> the PDF to reproduce the problem.
>>> The text to remove is: "To Be Approved". The text has a multi-byte
>> encoding, so I call first to encode it in order to find it then remove it.
>>> 
>>> Best Regards,
>>> a7mad
>>> 
>>>> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
>> wrote:
>>>> Dear a7mad,
>>>> 
>>>> removing text from a PDF is not an easy task as
>>>> - text which might visually appear as a single item might consistent of
>> individual parts within the PDF itself e.g. each character or groups of
>> characters are place individually in different COSStrings
>>>> - text might be drawn using graphics commands
>>>> - text can appear within different parts of the PDF (e.g. the text
>> might be content of a form field AND the annotation representing the form
>> field visually)
>>>> - you need to look up the encoding information to get form the
>> characters in the PDF "string" to the ones you are looking for
>>>> ….
>>>> 
>>>> If you can post a specific PDF to a public location and describe in
>> detail which string should have been replaced which hasn't I will be able
>> to tell you why that might have happened.
>>>> 
>>>> Maruan
>>>> 
>>>> 
>>>>> Am 23.03.2015 um 15:03 schrieb a7med shre3y <a7med.shre3y@gmail.com>:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Currently I am facing a strange problem removing text from the some
>> PDFs.
>>>>> My program is able to find the text and "remove it" by calling the
>>>>> COSString.reset() method.
>>>>> The problem is, when I open the output PDF file, I still see the text
>> but
>>>>> not selectable (I mean when I try to highlight it with the mouse to
>> copy
>>>>> it, it's not selectable!). When print the content (tokens) of the
>> output
>>>>> file, I DO NOT find the text at all!!
>>>>> 
>>>>> I am currently stuck in the PDF specifications 1.5 and really running
>> out
>>>>> of time.
>>>>> 
>>>>> I'd so much appreciate any help or any idea on what's going on.
>>>>> 
>>>>> Notes:
>>>>> 1. I use use PDFBox 1.7.1
>>>>> 2. This problem does not occur with all PDFs, only some PDFs cause
>> this
>>>>> problem.
>>>>> 
>>>>> Thank you very much.
>>>>> a7mad
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message