pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olaf Drümmer <olafl...@callassoftware.com>
Subject Re: Trouble With Dots In Field Names
Date Sat, 24 Sep 2016 18:56:10 GMT
AFAIK the period serves as a delimiter for nodes and leaves in a tree.

Example:

sender.address.name.first
sender.address.name.first
sender.address.street.name
sender.address.street.number
sender.address.ZIP
sender.address.city
… 

the actual fields (that can contain some value) are the leaf items: first, last, name. number,
ZIP, city

To the best of my knowledge, if a field is named “W55.21” it is actually a leaf item “21”(that
can have a value)  inside a parent node “W55” (that can’t hold a value).

It looks like someone built AcroForm  forms without understanding AcroForm forms.

Not sure how to “fix” this by using PDFBox. Maybe you need to rename the fields into something
that doesn’t use a period.


Olaf



> On 24 Sep 2016, at 17:13, Evan Williams <evan.williams@zapprx.com> wrote:
> 
> I have a problem, but I think it's non-terminal.
> 
> I have been using PDFBox to work with forms for about a year and a half,
> and I have a handle on many things, but I have a persistent and pernicious
> issue with forms where fields have periods ('.') in their name.
> 
> These forms are from external sources and are typically old school
> AcroForms. Because of the nature of the forms (medical), they often contain
> decimal values like '0.5 mg' or 'W55.21'. These forms do not seem to have
> ever been meant to be read programatically. They are for human consumption.
> 
> As far as I can tell, '.' is a magic character used by fully qualified
> names that delineates elements of the path. So when I iterate over the
> fields I get a bunch of name fragments as 'PDNonTerminalField's and regular
> fields.
> 
> My current way of dealing with this is to waste the time of a skilled
> graphic designer, or my own time, manually going in and fixing it. This is
> mostly just an annoyance. But annoyances add up. And I am trying to
> automate as much as I possibly can in dealing with these forms.
> 
> *Is there any obvious way to identify this corrupt situation and correct it*
> 
> I wonder if I Am just doing something wrong (I am iterating over the
> fields in the time honored way that the form example that is included with
> PDFBox uses).
> 
> Adobe Acrobat seems perfectly happy to deal with fields containing periods
> (including, unfortunately, allowing people to create them). So there must
> be some way to deal with this.
> 
> Your advice would be of great service to me.
> 
> Thank you.
> -- 
> *Evan Williams*
> Sr. Software Engineer
> evan.williams@zapprx.com
> 
> *www.ZappRx.com <http://www.zapprx.com/>*


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message