pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Fischer <fischer...@aon.at>
Subject Re: Text extraction results in strange characters
Date Thu, 23 Jun 2011 19:29:15 GMT
Hi Daniel,

I don't know if is working yet, but the EuDML project (www.eudml.eu) is working on a tool
to do just that, see http://www.eudml.eu/first-year-demos.

All the best
Thomas


Am 23.06.2011 um 18:46 schrieb Daniel Sánchez González:

> Thank you very much for your explanation. I'll try to convert pdf to image and then to
text via OCR. Which is the most accurate way to do this?


Mime
View raw message