lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-12993) Split the state.json into 2. a small frequently modified data + a large unmodified data
Date Tue, 20 Nov 2018 08:13:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noble Paul updated SOLR-12993:
------------------------------
    Description: 
This a just a proposal to minimize the ZK load and improve scalability of very large clusters.

Every time a small state change occurs for a collection/replica the following file needs to
be updated + read * n times (where n = no of replicas for this collection ). The proposal
is to split the main file into 2.
{code}
{"gettingstarted":{
    "pullReplicas":"0",
    "replicationFactor":"2",
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"-1",
    "autoAddReplicas":"false",
    "nrtReplicas":"2",
    "tlogReplicas":"0",
    "shards":{
      "shard1":{
        "range":"80000000-ffffffff",
      
        "replicas":{
          "core_node3":{
            "core":"gettingstarted_shard1_replica_n1",
            "base_url":"http://10.0.0.80:8983/solr",
            "node_name":"10.0.0.80:8983_solr",
            "state":"active",
            "type":"NRT",
            "force_set_state":"false",
            "leader":"true"},
          "core_node5":{
            "core":"gettingstarted_shard1_replica_n2",
            "base_url":"http://10.0.0.80:7574/solr",
            "node_name":"10.0.0.80:7574_solr",
         
            "type":"NRT",
            "force_set_state":"false"}}},
      "shard2":{
        "range":"0-7fffffff",
        "state":"active",
        "replicas":{
          "core_node7":{
            "core":"gettingstarted_shard2_replica_n4",
            "base_url":"http://10.0.0.80:7574/solr",
            "node_name":"10.0.0.80:7574_solr",
           
            "type":"NRT",
            "force_set_state":"false"},
          "core_node8":{
            "core":"gettingstarted_shard2_replica_n6",
            "base_url":"http://10.0.0.80:8983/solr",
            "node_name":"10.0.0.80:8983_solr",
         
            "type":"NRT",
            "force_set_state":"false",
            "leader":"true"}}}}}}
{code}
another file {{status.json}} which is frequently updated and small.
{code}
{
    "shard1": {
      "status": "ACTIVE",
      "core_node3": {"status": "ACTIVE"},
      "core_node5": {"status": "ACTIVE"}
    },
    "shard2": {
      "status": "ACTIVE",
      "core_node7": {"status": "ACTIVE"},
      "core_node8": {"status": "ACTIVE"}}
  }
{code}
Here the size of the file is roughly one tenth of the other file. This leads to a dramatic
reduction in the amount of data written/read to/from ZK.

  was:
This a just a proposal to minimize the ZK load and improve scalability of very large clusters.


Every time a small state change occurs for a collection/replica the following file needs to
be updated + read * n times (where n = no of replicas for this collection ). The proposal
is to split the main file into 2.
{code:json}
{"gettingstarted":{
    "pullReplicas":"0",
    "replicationFactor":"2",
    "router":{"name":"compositeId"},
    "maxShardsPerNode":"-1",
    "autoAddReplicas":"false",
    "nrtReplicas":"2",
    "tlogReplicas":"0",
    "shards":{
      "shard1":{
        "range":"80000000-ffffffff",
      
        "replicas":{
          "core_node3":{
            "core":"gettingstarted_shard1_replica_n1",
            "base_url":"http://10.0.0.80:8983/solr",
            "node_name":"10.0.0.80:8983_solr",
            "state":"active",
            "type":"NRT",
            "force_set_state":"false",
            "leader":"true"},
          "core_node5":{
            "core":"gettingstarted_shard1_replica_n2",
            "base_url":"http://10.0.0.80:7574/solr",
            "node_name":"10.0.0.80:7574_solr",
         
            "type":"NRT",
            "force_set_state":"false"}}},
      "shard2":{
        "range":"0-7fffffff",
        "state":"active",
        "replicas":{
          "core_node7":{
            "core":"gettingstarted_shard2_replica_n4",
            "base_url":"http://10.0.0.80:7574/solr",
            "node_name":"10.0.0.80:7574_solr",
           
            "type":"NRT",
            "force_set_state":"false"},
          "core_node8":{
            "core":"gettingstarted_shard2_replica_n6",
            "base_url":"http://10.0.0.80:8983/solr",
            "node_name":"10.0.0.80:8983_solr",
         
            "type":"NRT",
            "force_set_state":"false",
            "leader":"true"}}}}}}
{code}
another file {{status.json}} which is frequently updated and small.

{code:json}
{
    "shard1": {
      "s": 1,
      "core_node3": {"s": 1},
      "core_node5": {"s": 1}
    },
    "shard2": {
      "s": 1,
      "core_node7": {"s": 1},
      "core_node8": {"s": 1}}
  }
{code}

Here the size of the file is roughly one tenth of the other file. This leads to a dramatic
reduction in the amount of data written/read to/from ZK.


> Split the state.json into 2. a small frequently modified data + a large unmodified data
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-12993
>                 URL: https://issues.apache.org/jira/browse/SOLR-12993
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of very large
clusters.
> Every time a small state change occurs for a collection/replica the following file needs
to be updated + read * n times (where n = no of replicas for this collection ). The proposal
is to split the main file into 2.
> {code}
> {"gettingstarted":{
>     "pullReplicas":"0",
>     "replicationFactor":"2",
>     "router":{"name":"compositeId"},
>     "maxShardsPerNode":"-1",
>     "autoAddReplicas":"false",
>     "nrtReplicas":"2",
>     "tlogReplicas":"0",
>     "shards":{
>       "shard1":{
>         "range":"80000000-ffffffff",
>       
>         "replicas":{
>           "core_node3":{
>             "core":"gettingstarted_shard1_replica_n1",
>             "base_url":"http://10.0.0.80:8983/solr",
>             "node_name":"10.0.0.80:8983_solr",
>             "state":"active",
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"},
>           "core_node5":{
>             "core":"gettingstarted_shard1_replica_n2",
>             "base_url":"http://10.0.0.80:7574/solr",
>             "node_name":"10.0.0.80:7574_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false"}}},
>       "shard2":{
>         "range":"0-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node7":{
>             "core":"gettingstarted_shard2_replica_n4",
>             "base_url":"http://10.0.0.80:7574/solr",
>             "node_name":"10.0.0.80:7574_solr",
>            
>             "type":"NRT",
>             "force_set_state":"false"},
>           "core_node8":{
>             "core":"gettingstarted_shard2_replica_n6",
>             "base_url":"http://10.0.0.80:8983/solr",
>             "node_name":"10.0.0.80:8983_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"}}}}}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code}
> {
>     "shard1": {
>       "status": "ACTIVE",
>       "core_node3": {"status": "ACTIVE"},
>       "core_node5": {"status": "ACTIVE"}
>     },
>     "shard2": {
>       "status": "ACTIVE",
>       "core_node7": {"status": "ACTIVE"},
>       "core_node8": {"status": "ACTIVE"}}
>   }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads to a dramatic
reduction in the amount of data written/read to/from ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message