lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "onlinespending@gmail.com" <onlinespend...@gmail.com>
Subject Solr Distributed Search "start parameter" limitation
Date Tue, 01 Feb 2011 16:03:11 GMT
If you look at the Solr wiki, one of the limitations of distributed
searching it mentions is with regards to the start parameter.

http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

"Makes it more inefficient to use a high "start" parameter. For example, if
you request start=500000&rows=25 on an index with 500,000+ docs per shard,
this will currently result in 500,000 records getting sent over the network
from the shard to the coordinating Solr instance. If you had a single-shard
index, in contrast, only 25 records would ever get sent over the network."

While I may not have a start parameter of 500,000, I could easily have one
of 50,000, and it concerns me the hit in performance I may take when using
such a high start parameter with distributed searching. I would use this if
the user had issued a search query that resulted in say 50,000+ matches. I
may only display 40 matches per web page, with the user having the ability
to "jump" to the end of the results. So specifying a high start parameter is
certainly likely, and I know this sort of scenario is common for a lot of
websites. Are there tricks that can be played to avoid the performance hit
associated with specifying a high start parameter when doing distributed
searching?

Thanks,
Ben

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message