pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Extract Text of Document with coordinates
Date Thu, 31 Mar 2016 17:58:36 GMT
Am 31.03.2016 um 12:51 schrieb Felix Hermann:
> Hello,
>   
> how can I extract the text + coordinates of a PDF document?
>   
> To be more precise: I would like to extract all words of the document. And for each word
I need the coordinates of this word.
>   
> If PDFBox does not support this: How can I get the coordinates of each character?
>   
> I tried to adapt the code of this example: https://gist.github.com/DavidYKay/82f20ba67c50c499ebb3

Yes, the printtextlocations (or DrawPrintTextLocations) example is a 
good start. Look for the blanks and build words from there.

Tilman

> However, I was not successful, as I use the new PDFBox version. (2.0.0)
>   
> Regards
>   
> Felix
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message