pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tres Finocchiaro <tres.finocchi...@gmail.com>
Subject Re: PDF scraping -> Access DB part 2
Date Wed, 17 Feb 2016 03:24:35 GMT
This question really isn't about PDFBOX.

It's really about dynamic SQL generation/insertion and best practices and
also (assuming you're using Java) gets a bit into how to use JDBC to
connect to Access
<http://stackoverflow.com/questions/9543722/create-an-access-database-file-mdb-or-accdb-using-java>
.

Here's an example of a Prepared Statement:
http://www.mkyong.com/jdbc/jdbc-preparestatement-example-insert-a-record/

Again, this isn't really PDFBOX related, but rather a design decision you
have to make as a developer.

In regards to Prepared Statements, they can't solve all of your dynamic
insertion problems.

   - If you're building out your insert statements from scratch, you'll
   want to be very careful to sanitize your table names
   <https://github.com/LMMS/lmms.io/blob/master/public/lsp/dbo.php#L91:L115>,
   field names
   <https://github.com/LMMS/lmms.io/blob/master/public/lsp/utils.php#L178:L184>
   and variables (PHP examples provided only as an example).
   - If you have generically named fields in the PDF you can choose from,
   you may want to index them in an Enum or HashTable i.e. foo.put("Field1",
   "FirstName"); etc.

Last, I'd recommend proposing this question to StackOverflow as a generic
"best practices" question, since others could likely benefit from its
solution.

- Tres.Finocchiaro@gmail.com

On Tue, Feb 16, 2016 at 8:50 PM, Al Grant <bigal.nz@gmail.com> wrote:

> Hi Everybody
>
> Over the last few days I have been experimenting with little bits of code
> to scrape data from a PDF from, and made a little bit of test code to
> insert test data into a access database.
>
> My ultimate aim is to insert the data from the PDF to the database, where
> the data from each form is a new records.
>
> My question however is the best way to approach this in terms of the actual
> mapping of the fields from the PDF form to the database....
>
> My form probably has about 150 fields, and rather than hard coding
> pdf.name1-> db.name1 I was thinking perhaps of reading the field names from
> the pdf with a do while loop and then writing to the database if and only
> if there is a corresponding field?
>
> Thoughts?
>
> Cheers
>
> -Al
>
>
> --
> "Beat it punk!"
> - Clint Eastwood
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message