pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Trouble With Dots In Field Names
Date Thu, 29 Sep 2016 15:13:16 GMT
Hi,

> Am 29.09.2016 um 15:47 schrieb Evan Williams <evan.williams@zapprx.com>:
> 
> By good fortune I got a form in that shows the problem.
> 
> https://dl.dropboxusercontent.com/u/25802656/Tracleer%20Patient%20Enrollment%20and%20Consent%20Form%20Revised.pdf
> 
> There is a field that Acrobat quite happily calls 'Tracleer 62.5' and
> treats as an entirely normal text field. But of course PDFBox is confused
> by this.

The fieldname is "Tracleer 62.5 Quantity Text" and it's in fact two fields. One called ""Tracleer
62" with a child called "5 Quantity Text".

If you use 

PDField field = acroForm.getField("Tracleer 62.5 Quantity Text"); 

you'll be fine.

BR
Maruan


> 
> That is the kind of thing that I am talking about. And it is very easy to
> manually fix it in Acrobat of course, but I am trying to build automation
> tools and there are usually very important fields (the ones with the dots)
> that provide a great deal of informational content to my tools so they can
> reason about the form.
> 
> Thank you for looking at this.
> 
> On Sat, Sep 24, 2016 at 3:07 PM, Evan Williams <evan.williams@zapprx.com>
> wrote:
> 
>> Hi Maruan,
>> 
>> The answer to your question is yes, but my problem is that I tend to fix
>> the PDFs every time I find this issue so I am not certain that I have any
>> sitting around that show the problem. But it is easy enough to create. I
>> will just edit a PDF with Acrobat and put a dot in a field name. I will do
>> that later this afternoon.
>> 
>> Thank you.
>> 
>> On Sat, Sep 24, 2016 at 2:21 PM, Maruan Sahyoun <sahyoun@fileaffairs.de>
>> wrote:
>> 
>>> Hi,
>>> 
>>>> Am 24.09.2016 um 17:13 schrieb Evan Williams <evan.williams@zapprx.com
>>>> :
>>>> 
>>>> I have a problem, but I think it's non-terminal.
>>>> 
>>>> I have been using PDFBox to work with forms for about a year and a half,
>>>> and I have a handle on many things, but I have a persistent and
>>> pernicious
>>>> issue with forms where fields have periods ('.') in their name.
>>> 
>>> would it be possible to upload a sample to a public location to take a
>>> look.
>>> 
>>> BR
>>> 
>>> Maruan
>>> 
>>>> 
>>>> These forms are from external sources and are typically old school
>>>> AcroForms. Because of the nature of the forms (medical), they often
>>> contain
>>>> decimal values like '0.5 mg' or 'W55.21'. These forms do not seem to
>>> have
>>>> ever been meant to be read programatically. They are for human
>>> consumption.
>>>> 
>>>> As far as I can tell, '.' is a magic character used by fully qualified
>>>> names that delineates elements of the path. So when I iterate over the
>>>> fields I get a bunch of name fragments as 'PDNonTerminalField's and
>>> regular
>>>> fields.
>>>> 
>>>> My current way of dealing with this is to waste the time of a skilled
>>>> graphic designer, or my own time, manually going in and fixing it. This
>>> is
>>>> mostly just an annoyance. But annoyances add up. And I am trying to
>>>> automate as much as I possibly can in dealing with these forms.
>>>> 
>>>> *Is there any obvious way to identify this corrupt situation and
>>> correct it*
>>>> 
>>>> I wonder if I Am just doing something wrong (I am iterating over the
>>>> fields in the time honored way that the form example that is included
>>> with
>>>> PDFBox uses).
>>>> 
>>>> Adobe Acrobat seems perfectly happy to deal with fields containing
>>> periods
>>>> (including, unfortunately, allowing people to create them). So there
>>> must
>>>> be some way to deal with this.
>>>> 
>>>> Your advice would be of great service to me.
>>>> 
>>>> Thank you.
>>>> --
>>>> *Evan Williams*
>>>> Sr. Software Engineer
>>>> evan.williams@zapprx.com
>>>> 
>>>> *www.ZappRx.com <http://www.zapprx.com/>*
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>>> 
>> 
>> 
>> --
>> *Evan Williams*
>> Sr. Software Engineer
>> evan.williams@zapprx.com
>> 
>> *www.ZappRx.com <http://www.zapprx.com/>*
>> 
>> 
> 
> 
> -- 
> *Evan Williams*
> Sr. Software Engineer
> evan.williams@zapprx.com
> 
> *www.ZappRx.com <http://www.zapprx.com/>*


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message