pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Williams <evan.willi...@zapprx.com>
Subject Trouble With Dots In Field Names
Date Sat, 24 Sep 2016 15:13:33 GMT
I have a problem, but I think it's non-terminal.

I have been using PDFBox to work with forms for about a year and a half,
and I have a handle on many things, but I have a persistent and pernicious
issue with forms where fields have periods ('.') in their name.

These forms are from external sources and are typically old school
AcroForms. Because of the nature of the forms (medical), they often contain
decimal values like '0.5 mg' or 'W55.21'. These forms do not seem to have
ever been meant to be read programatically. They are for human consumption.

As far as I can tell, '.' is a magic character used by fully qualified
names that delineates elements of the path. So when I iterate over the
fields I get a bunch of name fragments as 'PDNonTerminalField's and regular

My current way of dealing with this is to waste the time of a skilled
graphic designer, or my own time, manually going in and fixing it. This is
mostly just an annoyance. But annoyances add up. And I am trying to
automate as much as I possibly can in dealing with these forms.

*Is there any obvious way to identify this corrupt situation and correct it*

 I wonder if I Am just doing something wrong (I am iterating over the
fields in the time honored way that the form example that is included with
PDFBox uses).

Adobe Acrobat seems perfectly happy to deal with fields containing periods
(including, unfortunately, allowing people to create them). So there must
be some way to deal with this.

Your advice would be of great service to me.

Thank you.
*Evan Williams*
Sr. Software Engineer

*www.ZappRx.com <http://www.zapprx.com/>*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message