lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Wed, 25 Feb 2004 22:13:42 GMT wrote:
> Bruce,
> Could a short term ( and possibly compromised )solution to your performance problem be
to offer only the first 3k of these large 200k docs to 
> the highlighter in order to minimize the amount of tokenization required? Arguably the
most relevant bit of a document is typically in the first 1k anyway?

Or perhaps the highlighter could be changed to stop tokenizing a 
document after 1000 tokens when enough fragments have been found to 
produce a summary.  That way, if there are hits in the first part of the 
  document, which there probably usually are for high-scoring hits, then 
the time to compute the summary is bounded by something less than the 
document size.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message