pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: fwd: A Benchmark and Evaluation for Text Extraction from PDF
Date Sat, 15 Jul 2017 13:14:00 GMT

> - the said itext is similar to PDFBox (page 7 upper right) ;-)

That one I noticed and also mentioned it in my mail to them.

Another thing I just saw today:

pdf2xml, that was said to be based on apache tika is also based on 
PDFBox 1.1.0:
https://bitbucket.org/tiedemann/pdf2xml/src/65b534eb6f10d2251185065bedc3ee7416bc5831/share/lib/pdfxtk/?at=master
and tika-app 1.3
https://bitbucket.org/tiedemann/pdf2xml/src/65b534eb6f10d2251185065bedc3ee7416bc5831/share/lib/?at=master

Tilman

>
> Andreas
>
>>
>> PDFBox is the best in 4 categories, the worst in one (missing 
>> newlines), and near the top in one (lack of errors). I have asked the 
>> authors to name me some of the files re: missing newlines, and the 
>> two error files.
>>
>> Tilman
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message