pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham G." <heshamgne...@gmail.com>
Subject PDFBox 1.4 performance in extracting text is slow
Date Mon, 10 Jan 2011 15:04:17 GMT
Hello ,

I am still upgrading from PDFBox 0.73 to PDFBox 1.4. The new version is very nice, better
extracting results and more PDFs work fine with it. But I have noticed the extracting performance
for the new version is much slower than version 0.73.

For example I have tested extracting the text from a 200 pages PDF (Page by page) using the
2 versions + Doing some little logic on the extracted data, and the result was very different:
version 0.73: Took 4 seconds.
version 1.4: Took 1 minute, 22 seconds.

The results were better in version 1.4 a bit, but the time consumed is very big. 
Is there any way I can fasten the extraction process for the PDF data in version 1.4 ?

Best regards ,
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message