cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "ArchitectureGossip_JP" by mayah
Date Wed, 30 Jun 2010 06:18:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ArchitectureGossip_JP" page has been changed by mayah.
The comment on this change is: 1/3ほど訳した.
http://wiki.apache.org/cassandra/ArchitectureGossip_JP

--------------------------------------------------

New page:
(翻訳中です)

= Gossiper =

gossiper は、システムの全てのノードが他のノード (なにか状態が変更されたときに到達不可能なノードやまだクラスタに参加していないノードを含む)
の状態に関する重要な情報を、徐々にしかし確実に知るようにするための責任を負っています。


== API ==

ゴシップの情報は、!ApplicationState オブジェクト (本質的には key/value
のペア) にくるまれています (より詳しくは下のデータ構造の節に書かれています)。gossiper
は、!IEndPointStateChangeSubscriber インタフェースを介して変更点を受け取るように登録した他のノードにそれを伝達します。そのインタフェースは、onJoin,
onAlive, onDead メソッド (どういうときによばれるかはわかりますね) と、!ApplicationState
が変更されたときに呼ばれる onChange メソッドを提供します。これには2つの自明でないプロパティーがあります。

 1. もしノードが複数の変更をある !ApplicationState キーに対して行った場合、他のノードは途中の状態でなく、最も新しい状態だけを見ることが出来るという保証があります。
 1. !ApplicationState を全て消去する場合の規定はありません。

== Gossiper の実装 ==

ゴシップタイマータスクは、毎秒走ります。各タスクが走る間、ノードは次のルールに従ってゴシップの交換を行ないます。

 1. (もしあれば) ランダムなエンドポイントにゴシップを伝えます。
 1. 到達できないノードと生きているノードの数に応じたある確率で、到達できないノードにゴシップを伝えます。
 1. もし、(1) でゴシップが送られたノードが seed でないか、生きているノード数が
seed の数を下回った場合、到達できないノードと生きているノードの数に応じたある確率で、ランダムな
seed のゴシップを伝えます。

これらのルールは、もしネットワークが up した場合に全てのノードがそのうち他の全てのノードの状態を知ることができるように開発されました。(もちろん、もし各ノードが1つの
seed としかコンタクトせず、知っているノードにランダムにゴシップを伝えるような場合は、複数のシードがあると分割
-- すなわち、各 seed がクラスタ中のノードのサブセットのみを知っている
-- されることがあります。Step 3 はこのような状態や他の些細な問題を分けます。

このようにして、ノードはゴシップの交換を各ラウンドで1~3ノードとおこないます。(もしくは、クラスタ中に自分しかいなければゴシップの交換は行われません)

== データ構造 ==
==== HeartBeatState ====
HeartBeatState は、世代とバージョン番号からなっています。世代はサーバーが走っている間は同じ値を取り、ノードがスタートすると増えていきます。ノードがリスタートする前と跡を区別するために使われます。バージョン番号は、application
state とシェアされ、順序が保証されます。各ノードは1つの HeartBeatState
と関連付けられています。

==== ApplicationState ====
Consists of state and version number and represents a state of single "component" or "element"
within Cassandra. For instance application state for "load information" could be (5.2, 45),
which means that node load is 5.2 at version 45. Similarly a node that is bootstrapping would
have "bootstrapping" application state: (bxLpassF3XD8Kyks, 56) where first one is bootstrap
token, and the second is version. Version number is shared by application states and HeartBeatState
to guarantee ordering and can only grow.

==== EndPointState ====
Includes all ApplicationStates and HeartBeatState for certain endpoint (node). EndPointState
can include only one of each type of ApplicationState, so if EndPointState already includes,
say, load information, new load information will overwrite the old one. ApplicationState version
number guarantees that old value will not overwrite new one.

==== endPointStateMap ====
Internal structure in Gossiper that has EndPointState for all nodes (including itself) that
it has heard about.

== ゴシップの交換 ==
=== GossipDigestSynMessage ===
Node starting gossip exchange sends GossipDigestSynMessage, which includes a list of gossip
digests. A single gossip digest consists of endpoint address, generation number and maximum
version that has been seen for the endpoint. In this context, maximum version number is the
biggest version number in EndPointState for this endpoint. An example to illustrate this better:

Suppose that node 10.0.0.1 has following information in its endPointStateMap (remember that
endPointStateMap includes also node itself):

{{{
EndPointState 10.0.0.1
  HeartBeatState: generation 1259909635, version 325
  ApplicationState "load-information": 5.2, generation 1259909635, version 45
  ApplicationState "bootstrapping": bxLpassF3XD8Kyks, generation 1259909635, version 56
  ApplicationState "normal": bxLpassF3XD8Kyks, generation 1259909635, version 87
EndPointState 10.0.0.2
  HeartBeatState: generation 1259911052, version 61
  ApplicationState "load-information": 2.7, generation 1259911052, version 2
  ApplicationState "bootstrapping": AujDMftpyUvebtnn, generation 1259911052, version 31
EndPointState 10.0.0.3
  HeartBeatState: generation 1259912238, version 5
  ApplicationState "load-information": 12.0, generation 1259912238, version 3
EndPointState 10.0.0.4
  HeartBeatState: generation 1259912942, version 18
  ApplicationState "load-information": 6.7, generation 1259912942, version 3
  ApplicationState "normal": bj05IVc0lvRXw2xH, generation 1259912942, version 7
}}}
In this case max version number for these endpoints are 325, 61, 5 and 18 respectively. A
gossip digest for endpoint 10.0.0.2 would be "10.0.0.2:1259911052:61" and essentially says
"AFAIK endpoint 10.0.0.2 is running generation 1259911052 and maximum version is 61". When
the node sends GossipDigestSynMessage, there will be exactly one gossip digest per known endpoint.
That is, in this case GossipDigestSynMessage contents would be: "10.0.0.1:1259909635:325 10.0.0.2:1259911052:61
10.0.0.3:1259912238:5 10.0.0.4:1259912942:18". HeartBeatState version number is not necessarily
always the biggest, but that is the most common situation by far.

=== Main code pointers: ===
{{{
Gossiper.GossipTimerTask.run: Main gossiper loop
Gossiper.makeRandomGossipDigest: Constructs gossip digest list to be used in GossipDigestSynMessage
Gossiper.makeGossipDigestSynMessage: Constructs GossipDigestSynMessage from a list of gossip
digests
}}}


== GossipDigestAckMessage ==
A node receiving GossipDigestSynMessage will examine it and reply with GossipDigestAckMessage,
which includes _two_ parts: gossip digest list and endpoint state list. From the gossip digest
list arriving in GossipDigestSynMessage we will know for each endpoint whether the sending
node has newer or older information than we do. An example to illustrate this:

Suppose that we're now in node 10.0.0.2 and our endPointState is as follows:

{{{
EndPointState 10.0.0.1
  HeartBeatState: generation 1259909635, version 324
  ApplicationState "load-information": 5.2, generation 1259909635, version 45
  ApplicationState "bootstrapping": bxLpassF3XD8Kyks, generation 1259909635, version 56
  ApplicationState "normal": bxLpassF3XD8Kyks, generation 1259909635, version 87
EndPointState 10.0.0.2
  HeartBeatState: generation 1259911052, version 63
  ApplicationState "load-information": 2.7, generation 1259911052, version 2
  ApplicationState "bootstrapping": AujDMftpyUvebtnn, generation 1259911052, version 31
  ApplicationState "normal": AujDMftpyUvebtnn, generation 1259911052, version 62
EndPointState 10.0.0.3
  HeartBeatState: generation 1259812143, version 2142
  ApplicationState "load-information": 16.0, generation 1259812143, version 1803
  ApplicationState "normal": W2U1XYUC3wMppcY7, generation 1259812143, version 6
}}}
Remember that the arriving gossip digest list is: "10.0.0.1:1259909635:325 10.0.0.2:1259911052:61
10.0.0.3:1259912238:5 10.0.0.4:1259912942:18". When the receiving end is handling this, following
steps are done:

==== Sort gossip digest list ====
Sort gossip digest list according to the difference in max version number between sender's
digest and our own information in descending order. That is, handle those digests first that
differ mostly in version number. Number of endpoint information that fits in one gossip message
is limited. This step is to guarantee that we favor sending information about nodes where
information difference is biggest (sending node has very old information compared to us).

==== Examine gossip digest list ====
At this stage we go through the arriving gossip digest list and construct the two parts of
GossipDigestAckMessage mentioned above (gossip digest list and endpoint state list). Let us
go through the example digest one by one:

'''10.0.0.1:1259909635:325''' In our own endPointStateMap the generation is the same, so 10.0.0.1
has not rebooted since we have last heard of it. Version number in the digest is bigger than
our max version number (325 > 324), so we have to ask the sender what has happened since
version 324. For this purpose we include a gossip digest 10.0.0.1:1259909635:324, which says
"I know about 10.0.0.1 only until generation 1259909635, version 324, please tell me anything
that is newer than this".

'''10.0.0.2:1259911052:61''' When examining this, we notice that we know more than the sender
about 10.0.0.2 (generations match, but our version is bigger 63 > 61). Sender's max version
is 61, so we look for any states that are newer than this. As we can see from the endPointStateMap,
there are two: Application state "normal" (version 62) and HeartBeatState (version 63). We
send these ApplicationStates to the sender. Please note that in this case we are not sending
digests, as digest only tells the maximum version number. In this case we already know that
there is difference, so we will send full ApplicationStates.

'''10.0.0.3:1259912238:5''' In this case generations do not match. Our generation is smaller
than the arriving, so 10.0.0.3 must have rebooted. We will ask all data from the sender for
generation 1259912238 starting from smallest version number 0. That is, we insert gossip digest
10.0.0.3:1259912238:0 to the reply.

'''10.0.0.4:1259912942:18''' We do not know anything about this endpoint, so we proceed in
the same manner as 10.0.0.3 and ask for all information. Insert digest 10.0.0.4:1259912942:0
to the reply.

At this point we have constructed GossipDigestAckMessage, which includes following information:

{{{
10.0.0.1:1259909635:324
10.0.0.3:1259912238:0
10.0.0.4:1259912942:0
10.0.0.2:[ApplicationState "normal": AujDMftpyUvebtnn, generation 1259911052, version 62],
[HeartBeatState, generation 1259911052, version 63]
}}}
We now send this GossipAckMessage to the sender of GossipSynMessage

=== Main Code Pointers: ===
{{{
GossipDigestSynVerbHandler.doVerb: Main function for handling GossipDigestSynMessage
GossipDigestSynVerbHandler.doSort: Sorts gossip digest list
Gossiper.examineGossiper: Examine gossip digest list
Gossiper.makeGossipDigestAckMessage: Constructs GossipDigestAckMessage from a list of gossip
digests
}}}
[[GossipDigestSynVerbHandler|]]

== GossipDigestAck2Message ==
Rest of gossip process here....

Mime
View raw message