Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9053A17961 for ; Mon, 23 Mar 2015 21:48:43 +0000 (UTC) Received: (qmail 72338 invoked by uid 500); 23 Mar 2015 21:48:38 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 72312 invoked by uid 500); 23 Mar 2015 21:48:38 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 72298 invoked by uid 99); 23 Mar 2015 21:48:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 21:48:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [213.133.104.168] (HELO www168.your-server.de) (213.133.104.168) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 21:48:32 +0000 Received: from [88.198.220.132] (helo=sslproxy03.your-server.de) by www168.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.80.1) (envelope-from ) id 1YaACf-0003Ql-IT for users@pdfbox.apache.org; Mon, 23 Mar 2015 22:48:09 +0100 Received: from [79.242.118.220] (helo=mbp001.intern) by sslproxy03.your-server.de with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.80) (envelope-from ) id 1YaACc-00060N-5w for users@pdfbox.apache.org; Mon, 23 Mar 2015 22:48:06 +0100 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: Text removal From: Maruan Sahyoun In-Reply-To: Date: Mon, 23 Mar 2015 22:48:02 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <9603F25B-98F3-4343-8B46-511EE025855C@fileaffairs.de> References: To: users@pdfbox.apache.org X-Mailer: Apple Mail (2.2070.6) X-Authenticated-Sender: sahyoun@fileaffairs.de X-Virus-Scanned: Clear (ClamAV 0.98.5/20229/Mon Mar 23 17:48:26 2015) X-Virus-Checked: Checked by ClamAV on apache.org Hi, your text is encoded so within the show text operator Tj the string is 7R %H $SSURYHG You wrote that you encode your string to find it - what do you get? BR Maruan > Am 23.03.2015 um 22:01 schrieb a7med shre3y : >=20 > Hi Maruan, >=20 > Here's a link from where you can download the PDF. >=20 > = https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrO= DZnRzdn/view?usp=3Dsharing >=20 > Kind Regards, > a7mad >=20 > On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun = > wrote: >=20 >> Hi, >>=20 >> you need to upload it to a public location as the mailing list = doesn't >> support attachments. >>=20 >> BR >> Maruan >>=20 >>> Am 23.03.2015 um 19:18 schrieb a7med shre3y = : >>>=20 >>> Dear Maruan, >>>=20 >>> Thank you very much for the information. Please find herewith = attached >> the PDF to reproduce the problem. >>> The text to remove is: "To Be Approved". The text has a multi-byte >> encoding, so I call first to encode it in order to find it then = remove it. >>>=20 >>> Best Regards, >>> a7mad >>>=20 >>>> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun = >> wrote: >>>> Dear a7mad, >>>>=20 >>>> removing text from a PDF is not an easy task as >>>> - text which might visually appear as a single item might = consistent of >> individual parts within the PDF itself e.g. each character or groups = of >> characters are place individually in different COSStrings >>>> - text might be drawn using graphics commands >>>> - text can appear within different parts of the PDF (e.g. the text >> might be content of a form field AND the annotation representing the = form >> field visually) >>>> - you need to look up the encoding information to get form the >> characters in the PDF "string" to the ones you are looking for >>>> =E2=80=A6. >>>>=20 >>>> If you can post a specific PDF to a public location and describe in >> detail which string should have been replaced which hasn't I will be = able >> to tell you why that might have happened. >>>>=20 >>>> Maruan >>>>=20 >>>>=20 >>>>> Am 23.03.2015 um 15:03 schrieb a7med shre3y = : >>>>>=20 >>>>> Hi all, >>>>>=20 >>>>> Currently I am facing a strange problem removing text from the = some >> PDFs. >>>>> My program is able to find the text and "remove it" by calling the >>>>> COSString.reset() method. >>>>> The problem is, when I open the output PDF file, I still see the = text >> but >>>>> not selectable (I mean when I try to highlight it with the mouse = to >> copy >>>>> it, it's not selectable!). When print the content (tokens) of the >> output >>>>> file, I DO NOT find the text at all!! >>>>>=20 >>>>> I am currently stuck in the PDF specifications 1.5 and really = running >> out >>>>> of time. >>>>>=20 >>>>> I'd so much appreciate any help or any idea on what's going on. >>>>>=20 >>>>> Notes: >>>>> 1. I use use PDFBox 1.7.1 >>>>> 2. This problem does not occur with all PDFs, only some PDFs cause >> this >>>>> problem. >>>>>=20 >>>>> Thank you very much. >>>>> a7mad >>>>=20 >>>>=20 >>>> = --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>>> For additional commands, e-mail: users-help@pdfbox.apache.org >>>=20 >>>=20 >>> = --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org >>> For additional commands, e-mail: users-help@pdfbox.apache.org >>=20 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org For additional commands, e-mail: users-help@pdfbox.apache.org