pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject AW: Parsing a pdf file takes 3minutes
Date Mon, 23 Dec 2013 15:12:55 GMT
Opened an issue therefor  https://issues.apache.org/jira/browse/PDFBOX-1821 

-----Urspr√ľngliche Nachricht-----
Von: Clemens Wyss - MySign AG [mailto:clemens.wyss@mysign.ch] 
Gesendet: Sonntag, 22. Dezember 2013 17:37
An: 'users@pdfbox.apache.org'
Betreff: Parsing a pdf file takes 3minutes

I initially posted this question in the tika-mailing list, and I even created an issue herefore:
https://issues.apache.org/jira/browse/TIKA-1213 
Hopefully now being on the right list, I re-phrase the problem I am confronted with:
I have (several) pdf documents which take up to 3minutes to be parsed/extracted (for later
lucene indexing). 
For example  the pdf which is attached to the jira issue requires 3minutes.

How/why is this possible? How can I improve on this?

Any help appreciated
Clemens

Mime
View raw message