pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Grant <bigal...@gmail.com>
Subject Re: PDF scraping -> Access DB part 2
Date Wed, 17 Feb 2016 03:33:03 GMT
Thanks Tres. All good points and I will look at those links.



On Wed, Feb 17, 2016 at 4:24 PM, Tres Finocchiaro <
tres.finocchiaro@gmail.com> wrote:

> This question really isn't about PDFBOX.
>
> It's really about dynamic SQL generation/insertion and best practices and
> also (assuming you're using Java) gets a bit into how to use JDBC to
> connect to Access
> <
> http://stackoverflow.com/questions/9543722/create-an-access-database-file-mdb-or-accdb-using-java
> >
> .
>
> Here's an example of a Prepared Statement:
> http://www.mkyong.com/jdbc/jdbc-preparestatement-example-insert-a-record/
>
> Again, this isn't really PDFBOX related, but rather a design decision you
> have to make as a developer.
>
> In regards to Prepared Statements, they can't solve all of your dynamic
> insertion problems.
>
>    - If you're building out your insert statements from scratch, you'll
>    want to be very careful to sanitize your table names
>    <
> https://github.com/LMMS/lmms.io/blob/master/public/lsp/dbo.php#L91:L115>,
>    field names
>    <
> https://github.com/LMMS/lmms.io/blob/master/public/lsp/utils.php#L178:L184
> >
>    and variables (PHP examples provided only as an example).
>    - If you have generically named fields in the PDF you can choose from,
>    you may want to index them in an Enum or HashTable i.e.
> foo.put("Field1",
>    "FirstName"); etc.
>
> Last, I'd recommend proposing this question to StackOverflow as a generic
> "best practices" question, since others could likely benefit from its
> solution.
>
> - Tres.Finocchiaro@gmail.com
>
> On Tue, Feb 16, 2016 at 8:50 PM, Al Grant <bigal.nz@gmail.com> wrote:
>
> > Hi Everybody
> >
> > Over the last few days I have been experimenting with little bits of code
> > to scrape data from a PDF from, and made a little bit of test code to
> > insert test data into a access database.
> >
> > My ultimate aim is to insert the data from the PDF to the database, where
> > the data from each form is a new records.
> >
> > My question however is the best way to approach this in terms of the
> actual
> > mapping of the fields from the PDF form to the database....
> >
> > My form probably has about 150 fields, and rather than hard coding
> > pdf.name1-> db.name1 I was thinking perhaps of reading the field names
> from
> > the pdf with a do while loop and then writing to the database if and only
> > if there is a corresponding field?
> >
> > Thoughts?
> >
> > Cheers
> >
> > -Al
> >
> >
> > --
> > "Beat it punk!"
> > - Clint Eastwood
> >
>



-- 
"Beat it punk!"
- Clint Eastwood

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message