lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <>
Subject Re: Performance of hit highlighting and finding term positions for
Date Fri, 02 Apr 2004 00:56:40 GMT wrote:

>730 msecs is the correct number for 10 * 16k docs with StandardTokenizer! 
>The 11ms per doc figure in my post was for highlighlighting using a \
>lower-case-filter-only analyzer. 5ms of this figure was the cost of the \
>lower-case-filter-only analyzer.
>73 msecs is the cost of JUST StandardTokenizer (no highlighting)
>StandardAnalyzer uses StandardTokenizer so is probably used in a lot of apps. It \
>tries to keep certain text eg email addresses as one term. I can live without it and \
>I suspect most apps can too. I haven't looked into why its slow but I notice it does \
>make use of Vectors. I think a lot of people's highlighter performance issues may \
>extend from this.
Looking at StandardTokenizer I can't see anything that would slow it 
down much... can we get the source to your lower case fitler?!



Please reply using PGP.    
    NewsMonster -
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - #infoanarchy | #p2p-hackers | #newsmonster

View raw message