pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Extracting text
Date Wed, 08 Apr 2015 14:01:16 GMT
Hi,

> Am 08.04.2015 um 15:48 schrieb Milan Tomic <tomicmilan@yahoo.com.INVALID>:
> 
> Hi Maruan,
> Thank you very much for a quick response.
> Yes, I would like to search for default text in form fields and to replace it with mine
text. Is it possible with PDFBox? If not, do you maybe know if it is possible with some other
PDF java lib?

Yes, you can fill form fields with PDFBox 

for W9 this would be something along the lines of (using PDFBox 1.8) 

PDDocument doc = PDDocument.load(new File("W9.pdf"));
PDAcroForm form = doc.getDocumentCatalog().getAcroForm();
PDField field = form.getField("Name");	
field.setValue("My New Name");


> Another file that I uploaded (fax.pdf) I believe doesn't contain form fields but some
strings in it like "Pat Stumuller" are not possible to extract using PDFBox. Do you maybe
know why? Is there maybe any workaround / solution?

for fax.pdf that's similar. The field you are looking for is called "To".

BR
Maruan


> Kind regards,Milan  
> 
> 
>     On Wednesday, April 8, 2015 3:25 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
wrote:
> 
> 
> Hi
> 
>> Am 08.04.2015 um 15:12 schrieb Milan Tomic <tomicmilan@yahoo.com.INVALID>:
>> 
>> 
>> Here are files uploaded because email server seems to removed it:
>> http://www.filedropper.com/w9
>> 
>> http://www.filedropper.com/fax_1
>> Kind regards,Milan 
>> 
>> 
>> 
>>     On Wednesday, April 8, 2015 2:58 PM, Milan Tomic <tomicmilan@yahoo.com.INVALID>
wrote:
>> 
>> 
>> Hello,
>> I am somehow new to PDF format of files and I don't understand its structure. I am
attaching 2 PDFs that I have problems with.
> 
> the documents contain form fields
> 
> 
>> The problem is that I can not extract and replace data: person name or company name.
Some other text is possible to extract, like field titles/descriptions.
> 
> do you mean to extract/replace the text in the form fields or other text contained in
the document
> 
>> 1. Why is some data text "hidden" and not accessible?
> 
> what do you mean by "hidden" 
> 
>> 2. Is there any way to transform PDF into "normal" PDF where each text is accessible
/ parsable / replacable.
>> I am trying to search a PDF for a string and to replace it.
> 
> PDF is a binary format - you can't do a simple search and replace.
> 
>> Thank you in advance,Milan
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message