pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Tabular Data Extracting
Date Sun, 14 May 2017 12:15:42 GMT
Am 14.05.2017 um 13:52 schrieb Alina Babenko:
> Hello, my name is Alina, I'm a student from Ukraine. I'm working on my student project
trying to get data from a table using C#. I've used your PDFBox 2.0 but it allowed me to take
data just as a text without sells bonding. Is it possible to get data from specific sells
of the table using your PDFBox 2.0? Do you have any examples? Could you help me to solve this
problem?
> I would really appreciate your help.
> Thank you so much.
>

Hello Alina,

PDFBox does not support tabular data extraction because PDF doesn't 
support it (except in tagged PDFs, which don't appear often). You may 
want to have a look at tabula.

http://tabula.technology/

You can use PDFBox if you know the positions in advance, then search in 
the source code examples for ExtractTextByArea.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message