lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: indexing performance issue
Date Fri, 01 Dec 2006 00:44:42 GMT
spinergywmy wrote:
>    I have posted this question before and this time I found that it could be
> pdfbox problem and this pdfbox I downloaded doesn't use the log4j.jar. To
> index the app 2.13mb pdf file took me 17s and total time to upload a file is
> 18s.

Re: PFDBox.

I have a 2.5Mb test file that extracts in ~4 seconds.  CPU is AMD athlon 3500+ 
at 2200MHz.  While converting it creates 4.7MB file in temp directory 
(pdfboxnnnnn.tmp).

Another 3.7MB test file takes over 20 seconds to extract and it creates 15MB 
temp file.

It seems dependent on the data, but Cygwin's PDFtotext also has similar 
performance profile over the two documents.  I would revert to the PDFBox forum 
to continue the discussion.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message