lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Wed, 25 Feb 2004 22:13:42 GMT
markharw00d@yahoo.co.uk wrote:
> Bruce,
> Could a short term ( and possibly compromised )solution to your performance problem be
to offer only the first 3k of these large 200k docs to 
> the highlighter in order to minimize the amount of tokenization required? Arguably the
most relevant bit of a document is typically in the first 1k anyway?

Or perhaps the highlighter could be changed to stop tokenizing a 
document after 1000 tokens when enough fragments have been found to 
produce a summary.  That way, if there are hits in the first part of the 
  document, which there probably usually are for high-scoring hits, then 
the time to compute the summary is bounded by something less than the 
document size.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message