lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: Performance of hit highlighting and finding term positions for
Date Thu, 01 Apr 2004 08:57:18 GMT
730 msecs is the correct number for 10 * 16k docs with StandardTokenizer! 
The 11ms per doc figure in my post was for highlighlighting using a \
lower-case-filter-only analyzer. 5ms of this figure was the cost of the \
lower-case-filter-only analyzer.

73 msecs is the cost of JUST StandardTokenizer (no highlighting)
StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \
tries to keep certain text eg email addresses as one term. I can live without it and \
I suspect most apps can too. I haven't looked into why its slow but I notice it does \
make use of Vectors. I think a lot of people's highlighter performance issues may \
extend from this.

Cheers
Mark
(apologies for the cross-post to lucene-dev - wrong group!)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message