pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Extracting text
Date Wed, 08 Apr 2015 13:25:06 GMT
Hi

> Am 08.04.2015 um 15:12 schrieb Milan Tomic <tomicmilan@yahoo.com.INVALID>:
> 
> 
> Here are files uploaded because email server seems to removed it:
> http://www.filedropper.com/w9
> 
> http://www.filedropper.com/fax_1
> Kind regards,Milan 
> 
> 
> 
>     On Wednesday, April 8, 2015 2:58 PM, Milan Tomic <tomicmilan@yahoo.com.INVALID>
wrote:
> 
> 
> Hello,
> I am somehow new to PDF format of files and I don't understand its structure. I am attaching
2 PDFs that I have problems with.

the documents contain form fields


> The problem is that I can not extract and replace data: person name or company name.
Some other text is possible to extract, like field titles/descriptions.

do you mean to extract/replace the text in the form fields or other text contained in the
document

> 1. Why is some data text "hidden" and not accessible?

what do you mean by "hidden" 

> 2. Is there any way to transform PDF into "normal" PDF where each text is accessible
/ parsable / replacable.
> I am trying to search a PDF for a string and to replace it.

PDF is a binary format - you can't do a simple search and replace.

> Thank you in advance,Milan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message