pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jl...@gi-bon.sk
Subject Re: How can I manipulate text in PDF'd by using PDFBox
Date Sat, 01 Sep 2012 08:49:53 GMT
Hi Mac,

you can use PDFTextStripper for this.
it will return you all texts from pages

Best regards
Juraj Lonc

GI-BÓN, spol. s r.o.
Management Systems

Bratislavská 11
SK - 010 01 Žilina
Tel: +421-41-564 3437-8
Mobil: +421-907-815 147
Fax: +421-41-564 3439
e-mail: jlonc@gi-bon.sk
homepage: http://www.gi-bon.sk 

From:   Mac P <pons32@hotmail.com>
To:     pdfbox <users@pdfbox.apache.org>, 
Date:   01. 09. 2012 10:02
Subject:        How can I manipulate text in PDF'd by using PDFBox

Hello Forum

Is there any way to to split a master pdf file consisted of so many pages 
into separate pages based on the content or keywords in each page?

Each page has the person's first and last name. I would like to grep the 
last name and write a scripts to separate each page, turn it into a new 
pdf file with the last name being part of the file name instead of 
sequential numbers matching the total number of pages at the end of each 
file name.

I know PDFs are binary documents. Are there any tools to look up the last 
names and manipulate them that way?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message