pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Comparing extracted text with pdftotext
Date Mon, 26 Nov 2018 20:49:55 GMT
All,

  I just finished drafting a high level "lab report" comparing
pdftotext and Tika/PDFBox on the PDFs in our refreshed regression
corpus: https://wiki.apache.org/tika/ComparisonTikaAndPDFToText201811.
The more interesting bits are in the actual reports from tika-eval
and/or the comparison database available here:
http://162.242.228.174/pdf_parsing/pdftotextVPDFBox_201811/

  Let me know what you think.

          Cheers,

                   Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message