pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank van der Hulst <drifter.fr...@gmail.com>
Subject Re: PDF extraction
Date Mon, 02 Feb 2015 18:56:28 GMT
I have written a couple of Java classes that extract tabular data to arrays
of Strings.

One works where the location of each column is fixed. The other figures out
the locations of columns from the table headers and outline drawing.

The usual story applies... hardly any documentation, and they only work for
limited cases. I've sent the code to Lorena... I'd be grateful if you could
improve the documentation.

NB: I'll be out of reach of my computer (and therefore my source code) for
the next few days, but will probably still be able to answer emails.

Frank


On Tue, Feb 3, 2015 at 7:07 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Hi Lorena,
>
> There is no concept of table in a PDF, except in a tagged PDF.
>
> A table is just lines and text. In no specific order. It could also be an
> image of a table.
>
> You can succeed in this only if you know the structure of the PDF in
> advance, e.g. when it all comes from the same client.
>
> https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
> https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
> https://stackoverflow.com/questions/17217194/extracting-
> table-contents-from-a-collection-of-pdf-files
> https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-
> tables
>
> Tilman
>
>
> Am 02.02.2015 um 16:29 schrieb Lorena Leishman:
>
>  Hi,
>> I have a PDF that has information displayed on tables. Example:
>> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
>>      123xxxxx              345xxxx               679xxxxStatus:
>>        Open                    Closed                 OpenBalance:
>>       $23.                      $0.00                    $100
>> Is there a way with PDFbox to extract a specific value(s) from the table?
>> Example: Bank Of America  and $0.00
>> And also is there a way to cut the whole table and paste it into a
>> different PDF?
>> Please let me know, Thanks!
>> Lorena
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message