cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Kivity <...@scylladb.com>
Subject Re: Unbalanced cluster
Date Tue, 11 Jul 2017 07:54:08 GMT
It is ScyllaDB specific. Scylla divides data not only among nodes, but 
also internally within a node among cores (=shards in our terminology). 
In the past we had problems with shards being over- and under-utilized 
(just like your cluster), so this simulator was developed to validate 
the solution.


On 07/11/2017 10:27 AM, Loic Lambiel wrote:
> Thanks for the hint and tool !
>
> By the way, what does the --shards parameter means ?
>
> Thanks
>
> Loic
>
> On 07/10/2017 05:20 PM, Avi Kivity wrote:
>> 32 tokens is too few for 33 nodes. I have a sharding simulator [1] and
>> it shows
>>
>>
>> $ ./shardsim --vnodes 32 --nodes 33 --shards 1
>> 33 nodes, 32 vnodes, 1 shards
>> maximum node overcommit:  1.42642
>> maximum shard overcommit: 1.426417
>>
>>
>> So 40% overcommit over the average. Since some nodes can be
>> undercommitted, this easily explains the 2X difference (40% overcommit +
>> 30% undercommit = 2X).
>>
>>
>> Newer versions of Cassandra have better token selection and will suffer
>> less from this.
>>
>>
>>
>> [1] https://github.com/avikivity/shardsim
>>
>>
>> On 07/10/2017 04:02 PM, Loic Lambiel wrote:
>>> Hi,
>>>
>>> One of our clusters is becoming somehow unbalanced, at least some of the
>>> nodes:
>>>
>>> (output edited to remove unnecessary information)
>>> --  Address         Load       Tokens  Owns (effective)   Rack
>>> UN  192.168.1.22   2.99 TB    32      10.6%               RACK1
>>> UN  192.168.1.23   3.35 TB    32      11.7%               RACK1
>>> UN  192.168.1.20   3.22 TB    32      11.3%               RACK1
>>> UN  192.168.1.21   3.21 TB    32      11.2%               RACK1
>>> UN  192.168.1.18   2.87 TB    32      10.3%               RACK1
>>> UN  192.168.1.19   3.49 TB    32      12.0%               RACK1
>>> UN  192.168.1.16   5.32 TB    32      12.9%               RACK1
>>> UN  192.168.1.17   3.77 TB    32      12.0%               RACK1
>>> UN  192.168.1.26   4.46 TB    32      11.2%               RACK1
>>> UN  192.168.1.24   3.24 TB    32      11.4%               RACK1
>>> UN  192.168.1.25   3.31 TB    32      11.2%               RACK1
>>> UN  192.168.1.134  2.75 TB    18      7.2%                RACK1
>>> UN  192.168.1.135  2.52 TB    18      6.0%                RACK1
>>> UN  192.168.1.132  1.85 TB    18      6.8%                RACK1
>>> UN  192.168.1.133  2.41 TB    18      5.7%                RACK1
>>> UN  192.168.1.130  2.95 TB    18      7.1%                RACK1
>>> UN  192.168.1.131  2.82 TB    18      6.7%                RACK1
>>> UN  192.168.1.128  3.04 TB    18      7.1%                RACK1
>>> UN  192.168.1.129  2.47 TB    18      7.2%                RACK1
>>> UN  192.168.1.14   5.63 TB    32      13.4%               RACK1
>>> UN  192.168.1.15   2.95 TB    32      10.4%               RACK1
>>> UN  192.168.1.12   3.83 TB    32      12.4%               RACK1
>>> UN  192.168.1.13   2.71 TB    32      9.5%                RACK1
>>> UN  192.168.1.10   3.51 TB    32      11.9%               RACK1
>>> UN  192.168.1.11   2.96 TB    32      10.3%               RACK1
>>> UN  192.168.1.126  2.48 TB    18      6.7%                RACK1
>>> UN  192.168.1.127  2.23 TB    18      5.5%                RACK1
>>> UN  192.168.1.124  2.05 TB    18      5.5%                RACK1
>>> UN  192.168.1.125  2.33 TB    18      5.8%                RACK1
>>> UN  192.168.1.122  1.99 TB    18      5.1%                RACK1
>>> UN  192.168.1.123  2.44 TB    18      5.7%                RACK1
>>> UN  192.168.1.120  3.58 TB    28      11.4%               RACK1
>>> UN  192.168.1.121  2.33 TB    18      6.8%                RACK1
>>>
>>> Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
>>> 192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
>>> They both have 32 tokens.
>>>
>>> The cluster is running:
>>>
>>> * Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
>>> enabled)
>>> * RF=3 with single DC and single rack. LCS as the compaction strategy,
>>> JBOD storage
>>> * Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>> * Node cleanup performed on all nodes
>>>
>>> Almost all of the cluster load comes from a single CF:
>>>
>>> CREATE TABLE blobstore.block (
>>>       inode uuid,
>>>       version timeuuid,
>>>       block bigint,
>>>       offset bigint,
>>>       chunksize int,
>>>       payload blob,
>>>       PRIMARY KEY ((inode, version, block), offset)
>>> ) WITH CLUSTERING ORDER BY (offset ASC)
>>>       AND bloom_filter_fp_chance = 0.01
>>>       AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>       AND comment = ''
>>>       AND compaction = {'tombstone_threshold': '0.1',
>>> 'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
>>> 'false', 'class':
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>>       AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>       AND dclocal_read_repair_chance = 0.1
>>>       AND default_time_to_live = 0
>>>       AND gc_grace_seconds = 172000
>>>       AND max_index_interval = 2048
>>>       AND memtable_flush_period_in_ms = 0
>>>       AND min_index_interval = 128
>>>       AND read_repair_chance = 0.0
>>>       AND speculative_retry = '99.0PERCENTILE';
>>>
>>> The payload column is almost the same size in each record.
>>>
>>> I understand that an unbalanced cluster may be the result of a bad
>>> Primary key, which I believe isn't the case here.
>>>
>>> Any clue on what could be the cause ? How can I re-balance it without
>>> any decommission ?
>>>
>>> My understanding is that nodetool move may only be used when not using
>>> the vnodes feature.
>>>
>>> Any help appreciated, thanks !
>>>
>>> ----
>>> Loic Lambiel
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message