lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajive Dave <hardtopm...@yahoo.com>
Subject Re: Observations: profiling indexing process
Date Wed, 20 Nov 2002 14:36:18 GMT
Yep we replaced javacc with our home grown tokenizer.
I think we gained almost 100% indexing speed because
our document size is rather large. 

Rajive

--- Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:
> Hello,
> 
> I decided to run a little Lucene app that does some
> indexing under a
> profiler. (I used JMP,
> http://www.khelekore.org/jmp/, a rather simple
> one).
> 
> The app uses StandardAnalyzer.
> I've noticed that a lot of time is spent in
> StandardTokenizer and
> various JavaCC-generated methods.
> I am wondering if anyone tried replacing
> StandardTokenizer.jj with
> something more efficient?
> 
> Also,StopFilter is using a Hashtable to store the
> list of stop words. 
> Has anyone tried using HashMap instead?
> 
> Thanks,
> Otis
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Web Hosting - Let the expert host your site
> http://webhosting.yahoo.com
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message