pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duane Nickull <du...@technoracle-systems.com>
Subject Re: How to parse check box content?
Date Tue, 04 Sep 2012 15:47:05 GMT
Ah, ok.  That makes sense then.  By checkbox I thought you meant a
checkbox you can electronically interact with.  If you do have a flat,
non-form PDF then OCR would be the only way.

One potential solution is to pre-process it with Acrobat and have it guess
form fields (this actually works fairly well in some cases).  THat would
turn it into a form PDF.  If you send me the form I can take a look.


Duane Nickull
***********************************
Technoracle Advanced Systems Inc.
Consulting and Contracting; Proven Results!
i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
b. http://technoracle.blogspot.com
t.  @duanechaos
"Don't fear the Graph!  Embrace Neo4J"






On 2012-09-04 7:17 AM, "David Hoffer" <dhoffer6@gmail.com> wrote:

>FYI, I just learned with some email discussions with iText users that
>the check boxes and check box marks are made using vector drawing
>instructions.  Is that something that PDFBox can help parse?  Or since
>I know where the check boxes are located am I better off converting
>this to an image and 'reading' the data via pixel analysis?
>
>-Dave
>
>On Tue, Sep 4, 2012 at 7:18 AM, David Hoffer <dhoffer6@gmail.com> wrote:
>> Hi Duane,
>>
>> Thanks for your reply.  I'll attach a sample of the type of document I
>> am trying to parse.  As you can see it does have check boxes but it's
>> not a form based document.
>>
>> (Note that the check boxes might not be technically radio buttons from
>> the point of view of the PDF document...but in actual practice users
>> will check at most one per group of boxes.)
>>
>> Thanks,
>> -Dave
>>
>> On Mon, Sep 3, 2012 at 10:37 PM, Duane Nickull
>> <duane@technoracle-systems.com> wrote:
>>> If you have a PDF document with a check box, by definition it is a
>>>"form"
>>> (it is a mutable document).  A radio button is, but definition, a
>>>group of
>>> choices where one is mutually selectable (two cannot be chosen).
>>>
>>> It is not that tricky to get access to the checkbox.  There are
>>>examples
>>> on the PDFBox website and also within the API Docs.
>>>
>>> java.lang.Object
>>> 
>>><http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html?is-extern
>>>al=
>>> true>
>>>   org.apache.pdfbox.pdmodel.interactive.form.PDField
>>> 
>>><http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/interactive/
>>>for
>>> m/PDField.html>
>>>       org.apache.pdfbox.pdmodel.interactive.form.PDChoiceButton
>>> 
>>><http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/interactive/
>>>for
>>> m/PDChoiceButton.html>
>>>           org.apache.pdfbox.pdmodel.interactive.form.PDCheckbox
>>>
>>>
>>>
>>> isCheckedpublic boolean isChecked()
>>>
>>> This will tell if this radio button is currently checked or not.
>>>
>>> Returns:true If the radio button is checked.
>>>
>>> If you require specific help with this, many of our staff are ex-adobe
>>> experts on PDF forms.
>>>
>>> Duane Nickull
>>>
>>> ***********************************
>>> Technoracle Advanced Systems Inc.
>>> Consulting and Contracting; Proven Results!
>>> i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
>>> b. http://technoracle.blogspot.com
>>> t.  @duanechaos
>>> "Don't fear the Graph!  Embrace Neo4J"
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2012-09-03 6:47 AM, "David Hoffer" <dhoffer6@gmail.com> wrote:
>>>
>>>>I have PDF's (regular pdf, non-form type) that contain check boxes and
>>>>I need to parse which one is selected.  So each group of check boxes
>>>>is a radio button group where only one will be selected/checked.  How
>>>>can I parse this to find out which in the group is checked?
>>>>
>>>>Thanks,
>>>>-Dave
>>>
>>>



Mime
View raw message