pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milan Tomic <tomicmi...@yahoo.com.INVALID>
Subject Re: Extracting text
Date Wed, 08 Apr 2015 13:48:03 GMT
Hi Maruan,
Thank you very much for a quick response.
Yes, I would like to search for default text in form fields and to replace it with mine text.
Is it possible with PDFBox? If not, do you maybe know if it is possible with some other PDF
java lib?
Another file that I uploaded (fax.pdf) I believe doesn't contain form fields but some strings
in it like "Pat Stumuller" are not possible to extract using PDFBox. Do you maybe know why?
Is there maybe any workaround / solution?
Kind regards,Milan  


     On Wednesday, April 8, 2015 3:25 PM, Maruan Sahyoun <sahyoun@fileaffairs.de> wrote:
   

 Hi

> Am 08.04.2015 um 15:12 schrieb Milan Tomic <tomicmilan@yahoo.com.INVALID>:
> 
> 
> Here are files uploaded because email server seems to removed it:
> http://www.filedropper.com/w9
> 
> http://www.filedropper.com/fax_1
> Kind regards,Milan 
> 
> 
> 
>    On Wednesday, April 8, 2015 2:58 PM, Milan Tomic <tomicmilan@yahoo.com.INVALID>
wrote:
> 
> 
> Hello,
> I am somehow new to PDF format of files and I don't understand its structure. I am attaching
2 PDFs that I have problems with.

the documents contain form fields


> The problem is that I can not extract and replace data: person name or company name.
Some other text is possible to extract, like field titles/descriptions.

do you mean to extract/replace the text in the form fields or other text contained in the
document

> 1. Why is some data text "hidden" and not accessible?

what do you mean by "hidden" 

> 2. Is there any way to transform PDF into "normal" PDF where each text is accessible
/ parsable / replacable.
> I am trying to search a PDF for a string and to replace it.

PDF is a binary format - you can't do a simple search and replace.

> Thank you in advance,Milan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message