lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Sturge <tstu...@hi5.com>
Subject Slow queries with lots of hits
Date Thu, 04 Dec 2008 20:27:30 GMT
Hi all,

I have an interesting problem with my query traffic. Most of the queries run
in a fairly short amount of time (< 100ms) but a few take over 1000ms. These
queries are predominantly those with a huge number of hits (>1 million hits
in a >100 million document index). The time taken (as far as I can tell) is
for lucene to sit there while it scores and sorts all these results.

However it turns out these queries really don¹t have top results. That is,
of the million documents, there are easily 10000 which are decent results
(basically those above some threshold score). Frankly, just returning some
consistent (so paging and reload work) but
otherwise arbitrary ranking of these 10000 results would be more than good
enough.

It seems to me that a solution would be to impose some sort of pseudo-random
filter (e.g. consider only every n-th document assuming they are uniformly
distributed). I¹m wondering if anyone else has experience with this sort of
issue and what solutions they have found to work well in practice.

Thanks,

Tim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message