lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: Performance of hit highlighting and finding term positions for
Date Wed, 31 Mar 2004 18:04:40 GMT
>>Folks have benchmarked this, and, for documents less than 10k characters or so, re-tokenizing
is fast enough.

As a note of warning: I did find StandardTokenizer to be the major culprit in my tokenizing
benchmarks (avg 75ms for 16k sized docs).
I have found I can live without StandardTokenizer in my apps.

>> The simplest is to not scan past the first 10k or so for snippets 
A maximum number of tokens will be a new feature in the new highlighter.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message