lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: ConjunctionScorer.doNext() overstays?
Date Thu, 01 Mar 2012 14:18:27 GMT
On Thu, Mar 1, 2012 at 8:49 AM, mark harwood <markharw00d@yahoo.co.uk> wrote:
> I would have assumed the many int comparisons would cost less than the superfluous disk
accesses? (I bow to your considerable experience in this area!)
> What is the worst-case scenario on added disk reads? Could it be as bad as numberOfSegments
x numberOfOtherscorers before the query winds up?

Well, it depends -- the disk access is a one-time thing but the added
per-hit check is per-hit.  At some point it'll cross over...

I think likely the advance(NO_MORE_DOCS) will not usually hit disk:
our skipper impl fully pre-buffers (in RAM) the top skip lists I
think?  Even if we do go to disk it's likely the OS pre-cached those
bytes in its IO buffer.

> On the index I tried, it looked like an improvement - the spreadsheet I linked to has
the source for the benchmark on a second worksheet if you want to give it a whirl on a different
dataset.

Maybe try it on a more balanced case?  Ie, N high-freq terms whose
freq is "close-ish"?  And on slow queries (I think the results in your
spreadsheet are very fast queries right?  The slowest one was ~0.95
msec per query, if I'm reading it right?).

In general I think not slowing down the worst-case queries is much
more important that speeding up the super-fast queries.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message