Return-Path: X-Original-To: apmail-pdfbox-users-archive@www.apache.org Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC900DD1D for ; Sat, 1 Sep 2012 08:50:20 +0000 (UTC) Received: (qmail 44223 invoked by uid 500); 1 Sep 2012 08:50:20 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 44149 invoked by uid 500); 1 Sep 2012 08:50:20 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 44137 invoked by uid 99); 1 Sep 2012 08:50:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Sep 2012 08:50:20 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [195.80.174.110] (HELO ares.gi-bon.sk) (195.80.174.110) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Sep 2012 08:50:13 +0000 In-Reply-To: References: To: users@pdfbox.apache.org MIME-Version: 1.0 Subject: Re: How can I manipulate text in PDF'd by using PDFBox X-KeepSent: E5C821F1:ACB9DBFA-C1257A6C:003036E8; type=4; name=$KeepSent X-Mailer: Lotus Notes Release 8.5.3 September 15, 2011 Message-ID: From: jlonc@gi-bon.sk Date: Sat, 1 Sep 2012 10:49:53 +0200 X-MIMETrack: Serialize by Router on Ares/GI-BON/SK(Release 8.5.3FP1 HF206|May 22, 2012) at 01.09.2012 10:50:13, Serialize complete at 01.09.2012 10:50:13 Content-Type: multipart/alternative; boundary="=_alternative 00308382C1257A6C_=" --=_alternative 00308382C1257A6C_= Content-Type: text/plain; charset="ISO-8859-2" Content-Transfer-Encoding: quoted-printable Hi Mac, you can use PDFTextStripper for this. it will return you all texts from pages Best regards Juraj Lonc GI-B=D3N, spol. s r.o. Management Systems Bratislavsk=E1 11 SK - 010 01 =AEilina Tel: +421-41-564 3437-8 Mobil: +421-907-815 147 Fax: +421-41-564 3439 e-mail: jlonc@gi-bon.sk homepage: http://www.gi-bon.sk=20 From: Mac P To: pdfbox ,=20 Date: 01. 09. 2012 10:02 Subject: How can I manipulate text in PDF'd by using PDFBox Hello Forum Is there any way to to split a master pdf file consisted of so many pages=20 into separate pages based on the content or keywords in each page? Each page has the person's first and last name. I would like to grep the=20 last name and write a scripts to separate each page, turn it into a new=20 pdf file with the last name being part of the file name instead of=20 sequential numbers matching the total number of pages at the end of each=20 file name. I know PDFs are binary documents. Are there any tools to look up the last=20 names and manipulate them that way? Thanks Mac =20 --=_alternative 00308382C1257A6C_=--