lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2765) Shard/Node states
Date Sun, 09 Oct 2011 23:15:29 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123800#comment-13123800
] 

Mark Miller commented on SOLR-2765:
-----------------------------------

The current method of dealing with downed nodes is not so bad - the cluster layout is compared
with the live_nodes - this gives searchers the ability to know a node is down within the ephemeral
timeout. Before that happens (a brief window), failed requests are simply retried on another
replica. The searcher locally marks that the server is bad, and then periodically tries it
again - unless the ephemeral goes down and it is no longer consulted.

bq. The client cannot derive this information accurately from simple liveness information.

It's simply not supported that way currently - this is intentional though. If you want to
change which shards a node is responsible for serving, you don't just bring it back up with
fewer or different shards - you first delete the node info from the cluster layout, then you
bring it up. We didn't mind that a variety of advanced scenarios require manual editing of
the zk layout at the time. We have intended to move towards a separate model and state layout
eventually though (see the solrcloud wiki page). That is essentially in the proposed path
I think.

I bias-ly lean against an overseer almost more than optimistic collection locks, but I have
not had time to fully digest the latest proposed changes. I suppose that when you have a solid
leader election process available, an overseer is fairly cheap, and if used for the right
things, fairly simple. When we get into rebalancing (we don't plan to right away), I suppose
we come back to it anyhow.

bq. marking replicas as defunct might do, 

Yeah, I think this gets complicated to do well in general. I like simple solutions like the
one above. And I think good monitoring is a perfectly acceptable requirement for a very large
cluster.


It's good stuff to consider. Exploring all of these changes should likely be spun off into
anther issue though. Advancements in how we handle all of this are a much larger issue than
Shard/Node states.
                
> Shard/Node states
> -----------------
>
>                 Key: SOLR-2765
>                 URL: https://issues.apache.org/jira/browse/SOLR-2765
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud, update
>            Reporter: Yonik Seeley
>             Fix For: 4.0
>
>         Attachments: combined.patch, incremental_update.patch, scheduled_executors.patch,
shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message