lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "SolrCloud" by YonikSeeley
Date Wed, 16 Dec 2009 22:10:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.


  Have some sort of command list that every server should execute before certain actions?
(could involve hitting URLs, executing system commands, etc)?
+ === Distributed Search ===
+ ==== Basic Distributed Search ====
+ The state of the cluster will be read at startup.  Changes to the state will be immediately
reflected in the internal representation via zookeeper watches.  Once a cluster state has
been built, a connection to zookeeper is not needed to serve requests (i.e. it can work when
disconnected from zk).
+ implementation detail:  certain information about the internal representation of the cluster
should be copied at the start of a request and probably shouldn't change during the request.
 This probably includes the shards that will be included in the request (we don't want that
changing between phases of a request), and the nodes we are querying for those shards.  Someone
may take a node out of service, or zookeeper may have marked the node as failed, but we can
simply continue using the normal request/failover logic for the duration of that distributed
+ Connection refused errors from solr_server->solr_server  (or other errors that we believe
would not result in an error if executed on a different node) should result in failover behavior
(re-request a different shard).  It can be a local policy decision to not try that node again
for a certain amount of time after so many of these errors.  Zookeeper does not need to be
updated with this info (but could be in the future).
+ ==== Timeouts ====
+ Zookeeper ephemeral znodes can be used to determine what servers are available for requests.
+ Q: if zookeeper dies and comes back up, does it come back with all the ephemeral nodes?
 If all the ephemeral nodes are deleted, we need to disregard and continue using our last
internal model.
+ solr_server->solr_server requests may result in a timeout after "shard-socket-timeout".
 If a flag indicating partialResults is set, we should not retry a different shard.  If a
flag indicating partialResults is not set, we fail the request, or retry a different shard,
depending on a new "retryOnTimeout" flag.  After a configurable number of timeouts, where
other shards did not timeout, we can mark the node as "slow" or "timedout" in zookeeper. 
A leader could optionally act on that information to remove the node or reallocate resources.
  == Resources ==

View raw message