lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike anderson <>
Subject Re: distributed search on duplicate shards
Date Thu, 30 Sep 2010 13:56:37 GMT
Thanks for the feedback. I ended up posting a patch to JIRA
although I've made a few changes since that patch. Already from our initial
tests we've seen a 10% improvement in the 90% line for response times, which
translates to a 50% improvement in the average time.

It would be nice to know more about the current plans for SolrCloud and it's
future development road map. I've seen a few threads on here asking for more
information, but it doesn't seem like a popular subject. I'll keep an eye on
it though.


On Wed, Sep 29, 2010 at 2:46 PM, Chris Hostetter

> : 4. The first shard from a set (solr1a, solr1b) to successfully return is
> : honored, and the other requests (solr1b, if solr1a responds first, for
> : instance) are removed/ignored
> : 5. The response is completed and returned as soon as one shard from each
> set
> : responds
> It seems like a useful feature to me ... i know some folks who have
> (non Solr/Lucene based) custom search infrastructures that do roughly
> the same thing.
> : 1. What are the known disadvantages to such a strategy? (we've thought of
> a
> : few, like sets being out of sync, but they don't bother us too much)
> you wind up burning a lot of CPU, but that's not a disadvantage as much sa
> it is a trade off -- the whole point of doing something like this is that
> you'd rather burn CPU (and wasting network IO) in order to improve your
> worst case latency.
> : 2. What would this type of a feature be called? This way I can open a
> Jira
> : ticket for it
> no idea ... "redundent shard requests" comes to mind.
> : 3. Is there a preferred way to do this? My current patch (wich I can post
> : soon) works in the HTTPClient portion of SearchHandler. I keep a hash map
> of
> : the shard sets and cancel the Future<ShardResponse>'s in the
> corresponding
> : set when each response comes back.
>         ...
> : P.S I'd like to write a test for this feature but it wasn't clear from
> the
> : distributed test how to do so. Could somebody point me in the right
> : direction (an existing test, perhaps) for how to accomplish this?
> I don't relaly have a good answer for either of those questions, but the
> one thing i can suggest is thta you take a look at the SolrCloud branch
> and think about how this functionality would integrate with that (both in
> terms of implementation and in how SolrCloud unit tests work)
> As you mentioned: the current approach in SolrCloud is to load balance
> against identical shards on mutiple nodes in the cluster, but that's not
> contradictory with your idea: they can work in conjunction with eachother
> (ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay",
> "shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two
> "redundent parallel shard requests" for "shard1A" and "shard1B" and each
> of those requests could then load balance between the respecitve
> underlying physical shards.
> -Hoss
> --
>  ...  October 7-8, Boston
>      ...  Stump The Chump!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message