lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
Date Wed, 01 Apr 2015 12:32:54 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noble Paul updated SOLR-6220:
-----------------------------
    Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are
allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system

All configurations are per collection basis. The rules are applied whenever a replica is created
in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection create command
with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given point 
 * disk : This is a dynamic variable  which gives the available disk space at any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node
will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all the tags for
a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should
be able to manage the tags and values of each node through a collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes with the given
key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued
parameter  "rule" . The values will be saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      type:“EC2Snitch”
    }
  “rules”:[
   {“key1”: “value1”, “key2”: “value2”},
   {"key1" :"x", "a":"b"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not conflict with each
other it should be OK. Or else an error is thrown
* In each rule , shard and replica are optional and have default values as {{\*}}
* There should be atleast one non wild card {{\*}} condition in that rule and there should
be at least one another condition other than tag than {{shard}} and {{replica}}
* all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but
values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 


Examples:
{noformat}
//in each rack there can be max two replica of each shard
 {rack:*,shard:*,replica:2-}
 {rack:*,replica:2-}
 //in each node there should be a max one replica of each shard
 {node:*,shard:*,replica:1-}
 {node:*,replica:1-}
 
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
 {replica:*,shard:*,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1, replica:*}
{rack:!738,shard:shard1}
//No replica should go to rack 738
{rack:!738,shard:*,replica:*}
{rack:!738,shard:*}
{rack:!738}

{noformat}



In the collection create API all the placement rules are provided as a parameter called placement
and multiple rules are separated with "|" 
example:
{noformat}
snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738

{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are
allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system

All configurations are per collection basis. The rules are applied whenever a replica is created
in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection create command
with the snitch prefix  . eg: snitch.type=EC2Snitch.

The system provides the following implicit tag names which cannot be used by other snitches
 * node : The solr nodename
 * host : The hostname
 * ip : The ip address of the host
 * cores : This is a dynamic varibale which gives the core count at any given point 
 * disk : This is a dynamic variable  which gives the available disk space at any given point


There will a few snitches provided by the system such as 

h3.EC2Snitch
Provides two tags called dc, rack from the region and zone values in EC2

h3.IPSnitch 
Use the IP to infer the “dc” and “rack” values

h3.NodePropertySnitch 
This lets users provide system properties to each node with tagname and value .

example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular node
will have two tags “tag-x” and “tag-y” .
 
h3.RestSnitch 
 Which lets the user configure a url which the server can invoke and get all the tags for
a given node. 

This takes extra parameters in create command
example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
The response of the  rest call   {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}

must be in json format 
eg: 
{code:JavaScript}
{
“tag-x”:”x-val”,
“tag-y”:”y-val”
}
{code}
h3.ManagedSnitch
This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should
be able to manage the tags and values of each node through a collection API 


h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes with the given
key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued
parameter  "rule" . The values will be saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      type:“EC2Snitch”
    }
  “rules”:[
   {“key1”: “value1”, “key2”: “value2”},
   {"key1" :"x", "a":"b"}
   ]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not conflict with each
other it should be OK. Or else an error is thrown
* In each rule , shard and replica are optional and have default values as {{\*}}
* There should be atleast one non wild card {{\*}} condition in that rule and there should
be at least one another condition other than tag than {{shard}} and {{replica}}
* all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but
values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 


Examples:
{noformat}
//in each rack there can be max two replica of each shard
 {rack:*,shard:*,replica:2-}
 {rack:*,replica:2-}
 //in each node there should be a max one replica of each shard
 {node:*,shard:*,replica:1-}
  
//In rack 738 and shard=shard1, there can be a max 0 replica
 {rack:738,shard:shard1,replica:0-}
 
 //All replicas of shard1 should go to rack 730
 {shard:shard1,replica:*,rack:730}
 {shard:shard1,rack:730}

 // all replicas must be created in a node with at least 20GB disk
 {replica:*,shard:*,disk:20+}
 {replica:*,disk:20+}
 {disk:20+}
 // All replicas should be created in nodes with less than 5 cores
 {replica:*,shard:*,cores:5-}
 {replica:*,cores:5-}
 {cores:5-}
//one replica of shard1 must go to node 192.168.1.2:8080_solr
{node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
//No replica of shard1 should go to rack 738
{rack:!738,shard:shard1, replica:*}
{rack:!738,shard:shard1}
//No replica should go to rack 738
{rack:!738,shard:*,replica:*}
{rack:!738,shard:*}
{rack:!738}

{noformat}



In the collection create API all the placement rules are provided as a parameter called placement
and multiple rules are separated with "|" 
example:
{noformat}
snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738

{noformat}


> Replica placement strategy for solrcloud
> ----------------------------------------
>
>                 Key: SOLR-6220
>                 URL: https://issues.apache.org/jira/browse/SOLR-6220
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster
are allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a replica
is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection create
command with the snitch prefix  . eg: snitch.type=EC2Snitch.
> The system provides the following implicit tag names which cannot be used by other snitches
>  * node : The solr nodename
>  * host : The hostname
>  * ip : The ip address of the host
>  * cores : This is a dynamic varibale which gives the core count at any given point 
>  * disk : This is a dynamic variable  which gives the available disk space at any given
point
> There will a few snitches provided by the system such as 
> h3.EC2Snitch
> Provides two tags called dc, rack from the region and zone values in EC2
> h3.IPSnitch 
> Use the IP to infer the “dc” and “rack” values
> h3.NodePropertySnitch 
> This lets users provide system properties to each node with tagname and value .
> example : -Dsolrcloud.snitch.vals=tag-x:val-a,tag-y:val-b. This means this particular
node will have two tags “tag-x” and “tag-y” .
>  
> h3.RestSnitch 
>  Which lets the user configure a url which the server can invoke and get all the tags
for a given node. 
> This takes extra parameters in create command
> example:  {{snitch={type=RestSnitch,url=http://snitchserverhost:port/[node]}}
> The response of the  rest call   {{http://snitchserverhost:port/?nodename=192.168.1:8080_solr}}
> must be in json format 
> eg: 
> {code:JavaScript}
> {
> “tag-x”:”x-val”,
> “tag-y”:”y-val”
> }
> {code}
> h3.ManagedSnitch
> This snitch keeps a list of nodes and their tag value pairs in Zookeeper. The user should
be able to manage the tags and values of each node through a collection API 
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes with the
given key value pairs. These parameters will be passed on to the collection CREATE api as
a multivalued parameter  "rule" . The values will be saved in the state of the collection
as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>       type:“EC2Snitch”
>     }
>   “rules”:[
>    {“key1”: “value1”, “key2”: “value2”},
>    {"key1" :"x", "a":"b"}
>    ]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and values
> *Each collection can have any number of rules. As long as the rules do not conflict with
each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica are optional and have default values as {{\*}}
> * There should be atleast one non wild card {{\*}} condition in that rule and there should
be at least one another condition other than tag than {{shard}} and {{replica}}
> * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing
but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 
> Examples:
> {noformat}
> //in each rack there can be max two replica of each shard
>  {rack:*,shard:*,replica:2-}
>  {rack:*,replica:2-}
>  //in each node there should be a max one replica of each shard
>  {node:*,shard:*,replica:1-}
>  {node:*,replica:1-}
>  
> //In rack 738 and shard=shard1, there can be a max 0 replica
>  {rack:738,shard:shard1,replica:0-}
>  
>  //All replicas of shard1 should go to rack 730
>  {shard:shard1,replica:*,rack:730}
>  {shard:shard1,rack:730}
>  // all replicas must be created in a node with at least 20GB disk
>  {replica:*,shard:*,disk:20+}
>  {replica:*,disk:20+}
>  {disk:20+}
>  // All replicas should be created in nodes with less than 5 cores
>  {replica:*,shard:*,cores:5-}
>  {replica:*,cores:5-}
>  {cores:5-}
> //one replica of shard1 must go to node 192.168.1.2:8080_solr
> {node:”192.168.1.2:8080_solr”, shard:shard1, replica:1} 
> //No replica of shard1 should go to rack 738
> {rack:!738,shard:shard1, replica:*}
> {rack:!738,shard:shard1}
> //No replica should go to rack 738
> {rack:!738,shard:*,replica:*}
> {rack:!738,shard:*}
> {rack:!738}
> {noformat}
> In the collection create API all the placement rules are provided as a parameter called
placement and multiple rules are separated with "|" 
> example:
> {noformat}
> snitch={type:EC2Snitch}&rule={shard:*,replica:1,dc:dc1}&rule={shard:*,replica:2-,dc:dc3}&rule{shard:shard1,replica:*,rack:!738

> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message