cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "Streaming_JA" by yutuki
Date Tue, 22 Jun 2010 02:50:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Streaming_JA" page has been changed by yutuki.
http://wiki.apache.org/cassandra/Streaming_JA

--------------------------------------------------

New page:
CassandraのClusterを構成するNode間でデータ移転を行う必要が出た場合、下記の様な手順で行われます。

 1. データ受信側が、データ送信側に対して必要とするデータの範囲を送ります。
 1. データ送信側は、受け取った範囲情報に従って必要なSStableファイルをStreamingの為にCopyします。複数のSSTableから単一のSSTableを生成する「Compaction」と逆の処理を行う為、この処理は「Anti-Compaction」と呼ばれています。
 1. データ送信側は、データ受信側に対してまず送信するデータの一覧を送り、それに続いて実データの転送を開始します。

Monitoring the status of streaming on both source and destination nodes can be found (in 0.6)
under the `org.apache.cassandra.streaming.StreamingService` MBean.  The `Status` attribute
gives an easy indication of what a node is doing with respect to streaming.

Step 2 is what takes the most time on most systems. The destination will be idle during this
stage; to monitor anti-compaction progress,  you should check the `Compaction` mbean on the
source.

Once step 3 begins actual data transfer, the sending node will report a status of `"Waiting
for transfer to $some_node to complete."`  The receiving node will report `"Receiving stream"`
while receiving stream data.  The `StreamDestinations` and `StreamSources` attributes each
contain a list of hosts that the current node is either sending stream data to or receiving
it from.

The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each return a list of
strings describing the status of individual files being streamed to and from a given host.
 Each string follows this format:  `[path to file] [bytes sent/received]/[file size]` If you
think that streaming is taking too long on your cluster, the first thing you should do is
check `StreamSources` or `StreamDestinations` to figure out which hosts are streaming files.
 Use those hosts as inputs to `getOutgoingFiles()` or `getIncomingFiles()` to check on the
status of individual files from the problematic source and destination nodes.  Streaming is
conducted in 32MB chunks, so you should refresh the file status after a few seconds to see
if the sent/received values change.  If they do not change, or change more slowly than you'd
like, something is wrong.  Keep in mind that a source node can only stream a single file at
a time, but a destination node can simultaneously receive several files.

Mime
View raw message