lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Slow queries with lots of hits
Date Fri, 05 Dec 2008 03:18:06 GMT
Tim (and we should move this to java-dev if it gains traction),

Perhaps you can come up with a mechanism to perform scoring in two passes instead of one:
- first pass is cheap and fast
- second pass is more expensive and slower

Currently, there is no choice - Lucene does 2).  But perhaps you can come up with a generic
way to do 1) ?

Sematext -- -- Lucene - Solr - Nutch

----- Original Message ----
> From: Tim Sturge <>
> To: "" <>
> Sent: Thursday, December 4, 2008 3:27:30 PM
> Subject: Slow queries with lots of hits
> Hi all,
> I have an interesting problem with my query traffic. Most of the queries run
> in a fairly short amount of time (< 100ms) but a few take over 1000ms. These
> queries are predominantly those with a huge number of hits (>1 million hits
> in a >100 million document index). The time taken (as far as I can tell) is
> for lucene to sit there while it scores and sorts all these results.
> However it turns out these queries really don¹t have top results. That is,
> of the million documents, there are easily 10000 which are decent results
> (basically those above some threshold score). Frankly, just returning some
> consistent (so paging and reload work) but
> otherwise arbitrary ranking of these 10000 results would be more than good
> enough.
> It seems to me that a solution would be to impose some sort of pseudo-random
> filter (e.g. consider only every n-th document assuming they are uniformly
> distributed). I¹m wondering if anyone else has experience with this sort of
> issue and what solutions they have found to work well in practice.
> Thanks,
> Tim

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message