lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3488) Create a Collections API for SolrCloud
Date Wed, 20 Jun 2012 17:58:43 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397695#comment-13397695
] 

Mark Miller commented on SOLR-3488:
-----------------------------------

Perhaps its a little too ambitious, but the reason I brought up the idea of the overseer handling
collection management every n seconds is:

Lets say you have 4 nodes with 2 collections on them. You want each collection to use as many
nodes as are available. Now you want to add a new node. To get it to participate in the existing
collections, you have to configure them, or create new compatible cores over http on the new
node. Wouldn't it be nice if the Overseer just saw the new node, that the collections had
repFactor=MAX_INT and created the cores for you?

Also, consider failure scenarios:

If you remove a collection, what happens when a node that was down comes back and had that
a piece of that collection? Your collection will be back as a single node. An Overseer process
could prune this off shortly after.

So numShards/repFactor + Overseeer smarts seems simple and good to me. But sometimes you may
want to be precise in picking shards/repliacs. Perhaps simply doing some kind of 'rack awareness'
type feature down the road is the best way to control this though. You could create connections
and weight costs using token markers for each node or something.

So I think maybe we would need a new zk node where solr instances register rather than cores?
then we know what is available to place replicas on - even if that Solr instance has no cores?

Then the Overseer would have a process that ran every n (1 min?) and looked at each collection
and its repFactor and numShards, and add or prune given the current state.

This would also account for failures on collection creation or deletion. If a node was down
and missed the operation, when it came back, within N seconds, the Overseer would add or prune
with the restored node.

It handles a lot of failures scenarios (with some lag) and makes the interface to the user
a lot simpler. Adding nodes can eventually mean just starting up a node new rather than requiring
any config. It's also easy to deal with changing the replication factor. Just update it in
zk, and when the Overseer process runs next, it will add and prune to match the latest value
(given the number of nodes available).



                
> Create a Collections API for SolrCloud
> --------------------------------------
>
>                 Key: SOLR-3488
>                 URL: https://issues.apache.org/jira/browse/SOLR-3488
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>         Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, SOLR-3488_2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message