lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: distributed search on duplicate shards
Date Wed, 29 Sep 2010 18:46:01 GMT

: 4. The first shard from a set (solr1a, solr1b) to successfully return is
: honored, and the other requests (solr1b, if solr1a responds first, for
: instance) are removed/ignored
: 5. The response is completed and returned as soon as one shard from each set
: responds

It seems like a useful feature to me ... i know some folks who have
(non Solr/Lucene based) custom search infrastructures that do roughly 
the same thing.

: 1. What are the known disadvantages to such a strategy? (we've thought of a
: few, like sets being out of sync, but they don't bother us too much)

you wind up burning a lot of CPU, but that's not a disadvantage as much sa 
it is a trade off -- the whole point of doing something like this is that 
you'd rather burn CPU (and wasting network IO) in order to improve your 
worst case latency.

: 2. What would this type of a feature be called? This way I can open a Jira
: ticket for it

no idea ... "redundent shard requests" comes to mind.

: 3. Is there a preferred way to do this? My current patch (wich I can post
: soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
: the shard sets and cancel the Future<ShardResponse>'s in the corresponding
: set when each response comes back.
	...
: P.S I'd like to write a test for this feature but it wasn't clear from the
: distributed test how to do so. Could somebody point me in the right
: direction (an existing test, perhaps) for how to accomplish this?

I don't relaly have a good answer for either of those questions, but the 
one thing i can suggest is thta you take a look at the SolrCloud branch 
and think about how this functionality would integrate with that (both in 
terms of implementation and in how SolrCloud unit tests work)

As you mentioned: the current approach in SolrCloud is to load balance 
against identical shards on mutiple nodes in the cluster, but that's not 
contradictory with your idea: they can work in conjunction with eachother 
(ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay", 
"shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two 
"redundent parallel shard requests" for "shard1A" and "shard1B" and each 
of those requests could then load balance between the respecitve 
underlying physical shards.



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message