cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
Date Wed, 25 Mar 2015 16:32:53 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380174#comment-14380174
] 

Branimir Lambov commented on CASSANDRA-7032:
--------------------------------------------

Patch is up for review [here|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment].
It gives the option to specify a "allocate_tokens_keyspace" when bringing up a node. The node's
tokens are then allocated to optimize the load distribution for the replication strategy of
that keyspace.

The allocation is currently restricted to Murmur3Partitioner and SimpleStrategy or NetworkTopologyStrategy
(is there anything else we need to support?). With the latter it cannot deal with cases where
the number of racks in the dc is more than one but less than the replication factor, which
should not be a common case.

There are a couple of things still left to do or explore, possibly in separate patches:
- add a dtest starting several nodes with allocation
- run a cstar_perf to see if it could show improvement for RF 2 in a 3-node cluster
- optimization of the selection for the first RF nodes in the cluster to guarantee good distribution
later (see ReplicationAwareTokenAllocator.testNewCluster)
- (if deemed worthwhile) multiple different replication factors in one datacentre; the current
code works ok when asked to allocate alternatingly but this could be improved if we consider
all relevant strategies in parallel

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java,
TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes hotspots of ownership.
It should be possible to improve dramatically on this with deterministic allocation. I have
quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will
repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also
ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The
allocation still permits slight discrepancies in ownership, but it is bound by the inverse
of the size of the cluster (as opposed to random allocation, which strangely gets worse as
the cluster size increases). I'm sure there is a decent dynamic programming solution to this
that would be even better.
> If on joining the ring a new node were to CAS a shared table where a canonical allocation
of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed
bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message