lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Shukla <hardtopm...@yahoo.com>
Subject Indexing too slow
Date Wed, 24 Jul 2002 23:14:07 GMT
I profiled the Lucene indexing using java profile
option looks like while indexing, it spends around 80%
of its time in StandardTokenizerManager.java and eats
up practically all the CPU on my 1GHz machine.  It
takes around 17 Minutes to index 140 MB of plain text
thats around 8MB/ minute. I think its too slow. 

Specially when we just want to tokenize based on white
spaces and some standard delimiters, I want to speed
it up to change by changing the grammer.

Just wondering if anybody has done any test in this
area. Or have some other clues as to how to speed it
up. I am wondering if we should use YACC instead of
javacc and call it thgough JNI although It shouldn't
matter because of hotspot but you nver know 

-Manish

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message