cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaakko Laine (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-617) gossiper scalability
Date Wed, 09 Dec 2009 14:51:18 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jaakko Laine updated CASSANDRA-617:
-----------------------------------

    Attachment: spread_90.png
                spread_50.png
                averages.png

I've run some simulated gossiper tests and there are some issues with Gossiper scalability:

(1) Overall capacity to spread state information:
Due to gossiper packet size, at most 117 gossip digest messages fit into one GossipDigestSynMessage.
This means that if there are more than 117 nodes, at one round only a subset of all digests
(endpoint states) can be gossiped. This can cause starvation as some nodes' state information
has to potentially queue for long time before getting a chance to spread (for example if there
are 234 nodes, there's only 50% chance for state information to spread at each stage). Due
to this, the time it takes for one state information to reach all nodes, grows logarithmically
only to 117 nodes. Growth rate after this seems to be linearish (it is not exponential as
randomness and multiple paths between any two nodes dampen the effect of worsening max_digest_size
/ all_endpoints_size ratio). Attached chart (averages.png) shows average amount of rounds
it takes for a piece of gossip to reach given percentage of nodes. As can be seen, after cluster
size exceeds 117 nodes, the curves take a sharp turn upwards.

As can also be seen, when cluster size grows, 98% curve stays considerably lower than 100%.
When there are only a few nodes without certain piece of gossip, max_digest_size / all_endpoints_size
ratio plays bigger role and it is more difficult for certain state information to reach the
last nodes.

Attached there are also spread_100.png, spread_90.png and spread_50.png charts. These show
minimum, maximum and average time to reach 100%, 90% and 50% of the nodes respectively. Samples
per one stage (number of nodes) were only 12 gossips, so there is some room for error, but
should give some indication as to how the gossiper behaves in larger clusters.

(2) Ability of Ack message to spread endpoint state information
(a) Size of GossipDigestAckMessage is similarly restricted, so the amount of gossip digests
and endpoint states it can carry is also limited. In the simulation, once cluster size reaches
about 80-90 nodes, not all endpoint states can be included in Ack message. This means that
on top of delays caused by Syn message limitations, also Ack message capacity will cause delays
in endpoint state propagation.

(b) Another related issue is how GossipDigestAckMessage size is divided between digests and
endpoint states. Currently all digests are included first, and then, if there is room, as
many endpoint states will be included as will fit. In the simulation this was not a problem,
but it might happen that digests take so much room that not many endpoint states can be included.

(3) New nodes entering a large cluster
(a) When a new node enters a ring, it will first gossip to a seed and let it know that it
has joined the cluster. However, the way SYN/ACK/ACK2 exchange works, the seed will not volunteer
information about any other nodes it knows about. Only when a seed randomly chooses this node
to gossip to, the new node will know (through the arriving Syn message) about other nodes.
In a large cluster, this chance might be very small (less than a percent), so the node might
need to wait for considerable time before getting knowledge of other nodes.

(b) Another related issue is endpoint state size. There are currently four "move" application
states, which make the whole endpoint state rather big. During normal operation these states
are of course delivered as deltas, but when a new node enters the ring, full state needs to
be delivered. Only 9-10 of these states fit to one Ack message, so it will take some time
before all data is delivered in a big cluster.


> gossiper scalability
> --------------------
>
>                 Key: CASSANDRA-617
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-617
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Jaakko Laine
>             Fix For: 0.5
>
>         Attachments: averages.png, spread_100.png, spread_50.png, spread_90.png
>
>
> Improve gossiper scalability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message