pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Trouble With Dots In Field Names
Date Sat, 24 Sep 2016 19:22:38 GMT
Hi Evan,

> Am 24.09.2016 um 21:07 schrieb Evan Williams <evan.williams@zapprx.com>:
> 
> Hi Maruan,
> 
> The answer to your question is yes, but my problem is that I tend to fix
> the PDFs every time I find this issue so I am not certain that I have any
> sitting around that show the problem. But it is easy enough to create. I
> will just edit a PDF with Acrobat and put a dot in a field name. I will do
> that later this afternoon.
> 

it could be possible to fix that by replacing the dot in the field name with something else
automatically if the priod is part of the name and not delimiting inividual fields. I'll come
up with a sample as soon as I've seen the template to be sure that there is really an individual
field (terminal field) with a period in the name. As Olaf pointed out the period is a delimiter
between non terminal and terminal fields. E.g. if you enter first.name in Acribat for the
field name you end up with non terminal field 'first' and a terminal field 'name'.

BR
Maruan

> Thank you.
> 
> On Sat, Sep 24, 2016 at 2:21 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
> wrote:
> 
>> Hi,
>> 
>>> Am 24.09.2016 um 17:13 schrieb Evan Williams <evan.williams@zapprx.com>:
>>> 
>>> I have a problem, but I think it's non-terminal.
>>> 
>>> I have been using PDFBox to work with forms for about a year and a half,
>>> and I have a handle on many things, but I have a persistent and
>> pernicious
>>> issue with forms where fields have periods ('.') in their name.
>> 
>> would it be possible to upload a sample to a public location to take a
>> look.
>> 
>> BR
>> 
>> Maruan
>> 
>>> 
>>> These forms are from external sources and are typically old school
>>> AcroForms. Because of the nature of the forms (medical), they often
>> contain
>>> decimal values like '0.5 mg' or 'W55.21'. These forms do not seem to have
>>> ever been meant to be read programatically. They are for human
>> consumption.
>>> 
>>> As far as I can tell, '.' is a magic character used by fully qualified
>>> names that delineates elements of the path. So when I iterate over the
>>> fields I get a bunch of name fragments as 'PDNonTerminalField's and
>> regular
>>> fields.
>>> 
>>> My current way of dealing with this is to waste the time of a skilled
>>> graphic designer, or my own time, manually going in and fixing it. This
>> is
>>> mostly just an annoyance. But annoyances add up. And I am trying to
>>> automate as much as I possibly can in dealing with these forms.
>>> 
>>> *Is there any obvious way to identify this corrupt situation and correct
>> it*
>>> 
>>> I wonder if I Am just doing something wrong (I am iterating over the
>>> fields in the time honored way that the form example that is included
>> with
>>> PDFBox uses).
>>> 
>>> Adobe Acrobat seems perfectly happy to deal with fields containing
>> periods
>>> (including, unfortunately, allowing people to create them). So there must
>>> be some way to deal with this.
>>> 
>>> Your advice would be of great service to me.
>>> 
>>> Thank you.
>>> --
>>> *Evan Williams*
>>> Sr. Software Engineer
>>> evan.williams@zapprx.com
>>> 
>>> *www.ZappRx.com <http://www.zapprx.com/>*
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
>> 
> 
> 
> -- 
> *Evan Williams*
> Sr. Software Engineer
> evan.williams@zapprx.com
> 
> *www.ZappRx.com <http://www.zapprx.com/>*


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message