pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Nibali <rnib...@gmail.com>
Subject Re: Migrate form field entries from one pdf to another
Date Tue, 07 Jul 2015 08:39:31 GMT
Hi Maruan


>
> > This is highly confusing. Why can Acrobat deal with those checkboxes when
> > their value is null and why can't PDFBox set Checkbox values?
> >
> > How can I simply clone all static PDF form entries of a PDF into a new
> PDF?
> > Is PDF really that complex that such a simple thing is not possible?
> Right
> > now, only text form entries are copied, the rest shows null for
> getValue().
>
> the reason that getValue() returns null is that there is no value entry
> set for the filled out form field (this is held in the field dictionaries
> /V entry). But isChecked() returns true as the checkbox has been checked.
> This is bases on the appearance state of the checkbox.
>

I see; slowly I'm seeing the gist here. PDF truly is a tricky format and it
hides it so well from the everyday users through the "Acrobat" tools.


> To give you a quick explanation of that. When a form field is filled out
> the value of the form field has to be filled. But that won't give you any
> visual information. To add the visual information the form field has a
> annotation assigned to it which will have whats's called an appearance. The
> appearance is what's visible on screen or when the pdf is being printed.
>

Understood, albeit from a first notion point of view, this seems an overly
complex architecture. I'm sure there must be reasons for this. Thanks to
your explanantion I finally start to see the bigger picture.


> Normally an application set the value AND the appearance when the form
> field is filled. In you case the form filling application hasn't set the
> field value (that's why getValue() return null) and ONLY updated the
> appearance.
>

One of the applications used is the notoriously bad choice of InDesign to
create form fields, the test PDFs I created using the Adobe Acrobat Pro
tool for Mac, which I downloaded for a one month evaluation period. I used
the original PDFs and stripped out everything that would otherwise have
identified the origins of the PDF and removed all entries but a few test
form fields. Then I exchanged the partial fonts for the fields with some
available ones (I believe it was Garamond). Reading through Tilman's
replies, I learned that this also lead to issues with regard to font
handling.


> So to transfer the value from you original form to the new template you
> have to
>
> a) see if getValue() return anything but null. If that is the case use
> setValue() with the value provided by getValue() to fill out the
> corresponding field in your template
> b) if getValue() is null check using isChecked() if the checkbox has been
> checked. If this is the case use check() to check the checkbox
>

I thought that that's what I did after your last suggestion (where you
wrote exactly those two lines as well), however I have the distinct feeling
that I did something else wrong. Tilman Hausherr kindly provided me with
some test code that seems to work for the test PDF cases I provided. I have
already spotted one basic mistake in my code after quickly glancing over
his. The notion to clone fields from one PDF to another seems to involve
the instantiation of a new PDField object in the template PDF. I had
assumed that assigning the values of the fields read from the originating
PDF to the template PDF would be enough. Never would it have occurred to me
that one needs to instantiate a new PDField object.

Anyway, I'll rewrite my code again to incorporate all this new knowledge
and update to a SNAPSHOT version of PDFBox. Unfortunately, I have no idea
how to use automatic references in Maven so the newest SVN trunk state is
checked out and a JAR is generated as a reference. If the project were done
using git, one could use the https://jitpack.io/ add-on for Maven.

In fact, the canonical source SCM is SVN at apache.org:
https://svn.apache.org/repos/asf/pdfbox/
There is a copy/sync at github: https://github.com/apache/pdfbox, however
it only syncs the old 1.8.9 tree, not the current 2.0.0 snapshot tree.

I suppose that using the latest SNAPSHOT of PDFBox and all dependencies
should suffice for my test case.

We have done some changes to how checkboxes and radio buttons are handled
> in PDFBox 2.0 within the last dates (to make it easier to work with them)
> so please use the latests snapshot version of PDFBox.
>
> There will be an issue with the test template when you set the Name and
> Prename field as the field definition is incomplete (the font resource is
> missing) which will lead to an exception
>
> java.io.IOException: Could not find font: /Courier
>

That's because I probably just didn't know what I was doing when stripping
down the original PDF to provide you guys with a test case. I will
certainly try the new code with the real PDFs and report back as soon as
things progressed.


> The easiest would be to correct the template. If that's not possible we
> could help you building a short workaround. But as the template you
> provided was only a quick mock up and not the real one the final template
> might not have the issue.
>
> If you need further assistance please let us know.
>

Thanks so much!!!

Best regards

Roberto

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message