pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDF extraction
Date Mon, 02 Feb 2015 18:07:16 GMT
Hi Lorena,

There is no concept of table in a PDF, except in a tagged PDF.

A table is just lines and text. In no specific order. It could also be 
an image of a table.

You can succeed in this only if you know the structure of the PDF in 
advance, e.g. when it all comes from the same client.

https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
https://stackoverflow.com/questions/17217194/extracting-table-contents-from-a-collection-of-pdf-files
https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-tables

Tilman


Am 02.02.2015 um 16:29 schrieb Lorena Leishman:
> Hi,
> I have a PDF that has information displayed on tables. Example:
> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:             123xxxxx
             345xxxx               679xxxxStatus:                   Open                 
  Closed                 OpenBalance:                $23.                      $0.00     
              $100
> Is there a way with PDFbox to extract a specific value(s) from the table? Example: Bank
Of America  and $0.00
> And also is there a way to cut the whole table and paste it into a different PDF?
> Please let me know, Thanks!
> Lorena


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message