pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: fwd: A Benchmark and Evaluation for Text Extraction from PDF
Date Sat, 15 Jul 2017 12:38:44 GMT
Am 15.07.2017 um 13:22 schrieb Tilman Hausherr:
> http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf
> 
> A Benchmark and Evaluation for Text Extraction from PDF
Interesting, some details I've already found:

- they used 2.0.3
- the said itext is similar to PDFBox (page 7 upper right) ;-)

Andreas

> 
> PDFBox is the best in 4 categories, the worst in one (missing newlines), and 
> near the top in one (lack of errors). I have asked the authors to name me some 
> of the files re: missing newlines, and the two error files.
> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message