pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss - MySign AG <clemens.w...@mysign.ch>
Subject Parsing a pdf file takes 3minutes
Date Sun, 22 Dec 2013 16:37:25 GMT
I initially posted this question in the tika-mailing list, and I even created an issue herefore:
https://issues.apache.org/jira/browse/TIKA-1213 
Hopefully now being on the right list, I re-phrase the problem I am confronted with:
I have (several) pdf documents which take up to 3minutes to be parsed/extracted (for later
lucene indexing). 
For example  the pdf which is attached to the jira issue requires 3minutes.

How/why is this possible? How can I improve on this?

Any help appreciated
Clemens

Mime
View raw message