From user-return-31555-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Jan 31 20:19:52 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3FD8E60B for ; Thu, 31 Jan 2013 20:19:52 +0000 (UTC) Received: (qmail 25005 invoked by uid 500); 31 Jan 2013 20:19:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24964 invoked by uid 500); 31 Jan 2013 20:19:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24955 invoked by uid 99); 31 Jan 2013 20:19:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 20:19:50 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.30.239.211] (HELO nm40-vm3.bullet.mail.bf1.yahoo.com) (72.30.239.211) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 20:19:38 +0000 Received: from [98.139.215.143] by nm40.bullet.mail.bf1.yahoo.com with NNFMP; 31 Jan 2013 20:19:17 -0000 Received: from [98.139.212.209] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 31 Jan 2013 20:19:17 -0000 Received: from [127.0.0.1] by omp1018.mail.bf1.yahoo.com with NNFMP; 31 Jan 2013 20:19:17 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 731367.54775.bm@omp1018.mail.bf1.yahoo.com Received: (qmail 47580 invoked by uid 60001); 31 Jan 2013 20:19:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1359663557; bh=8iBjPWI9L8L4Em0+g7ngQa0BHV1i6R8ewn0hywiWg6k=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=qvDaB9HDFdnSFBwfPMyahVcdq63u1jRrc60leNi5jSrG1/FGxjScSHEsYugbhkQ7diSmK6QWZxZcrlsLkFXhN9Fzm7cI5QXvP68fb/6cwRM5Uu21pvBVLIlDTVsIE9oVNiaQ7NABNocy6IRlHzoGhooL7ANVC8RVWpK7lLDqGYQ= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ndzZz46S9HgGljrQFXr9xqhGuvA1iGDhyusE4qArAlI8sWOLPM0t6nd43KomO5CYtTkeVj2NY2Og9DU43Uzzo/E4w/TbdQI/2sXQbYjsNFcsotx01TumARUovS/g4w7Cizt5uK/HLN9FWMcTfrBkq8IfHG56sd1kAga/x6suQ+c=; X-YMail-OSG: 2Ej5B44VM1lLakHy1vqgvwvn33xm4rJehqP6QWjuLk5N88c Z7UGZ9sa8xnXp9ItNA4KV4hsknVbdc6Igv.7d05IXJ8ADaSTELMFUoScLNMx l8sWtwz3vC8y7pMl..dF7IxKZR8yZHIvfrgDLco0cRSCla8mhiuutVqhhwJ6 Gfqn_wFVEG5Vktnqjzoh_0SmOYip7rS4ztudI6O6JlK7kTRsCbmK.51HNgvU _KSnICn7KfE5oEJJcVHExzkS8XTfwt6YOBc6BibNRTC9.opKMBTqXkal6iBj FvnGCyLed1WtTuKh04WcGxtTn_bE4aQeHkWRESp5vAvh7xDsxC2tgjqQaGBv Z0AwEXunMbPa12M0vip8D3qkkDdv1DGfxV_cexZT0N3kLcM8SKRxaPty40VZ MnLlWgFMWJv0urF_LU9Gh4eOvRIJTQi1dSU.97Mk9MVXgN4KYAoTcSiM8WwH 9qBpKgqyJgROTM__4Aktl6y_fSuERnFVBQ5_FtRn7YH8uP_j_IvlHDbMFnDp P1snrXrSZgU8Iaw-- Received: from [208.185.20.30] by web160906.mail.bf1.yahoo.com via HTTP; Thu, 31 Jan 2013 12:19:17 PST X-Rocket-MIMEInfo: 001.001,SSBkZWNpZGVkIHRvIGRpZyBpbiB0byB0aGUgc291cmNlIGNvZGUsIGxvb2tzIGxpa2UgaW4gdGhlIGNhc2Ugb2Ygbm9kZXRvb2wgcmVwYWlyLCBpZiB0aGUgY3VycmVudCBub2RlIHNlZXMgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgcmVtb3RlIG5vZGVzIGJhc2VkIG9uIHRoZSBtZXJrbGUgdHJlZSBjYWxjdWxhdGlvbiwgaXQgd2lsbCBzdGFydCBhIHN0cmVhbXJlcGFpciBzZXNzaW9uIHRvIGFzayB0aGUgcmVtb3RlIG5vZGVzIHRvIHN0cmVhbSBkYXRhIGJldHdlZW4gwqBlYWNoIG90aGVyLsKgCgpCdXQBMAEBAQE- X-Mailer: YahooMailWebService/0.8.131.499 References: <1359658240.54372.YahooMailNeo@web160901.mail.bf1.yahoo.com> Message-ID: <1359663557.47474.YahooMailNeo@web160906.mail.bf1.yahoo.com> Date: Thu, 31 Jan 2013 12:19:17 -0800 (PST) From: Wei Zhu Reply-To: Wei Zhu Subject: Re: General question regarding bootstrap and nodetool repair To: "user@cassandra.apache.org" In-Reply-To: <1359658240.54372.YahooMailNeo@web160901.mail.bf1.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="967773369-1829278262-1359663557=:47474" X-Virus-Checked: Checked by ClamAV on apache.org --967773369-1829278262-1359663557=:47474 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I decided to dig in to the source code, looks like in the case of nodetool = repair, if the current node sees the difference between the remote nodes ba= sed on the merkle tree calculation, it will start a streamrepair session to= ask the remote nodes to stream data between =A0each other.=A0=0A=0ABut I a= m still not sure how about the my first question regarding the bootstrap, a= nyone?=0A=0AThanks.=0A-Wei=0A=0A________________________________=0A From: W= ei Zhu =0ATo: Cassandr usergroup =0ASent: Thursday, January 31, 2013 10:50 AM=0ASubject: General questio= n regarding bootstrap and nodetool repair=0A =0A=0AHi,=0AAfter messing arou= nd with my Cassandra cluster recently, I think I need some basic understand= ing on how things work behind scene regarding data streaming.=0ALet's say w= e have three node cluster with RF =3D 3. =A0If node 3 for some reason dies = and I want to replace it with a new node with the same (maybe minus one) ra= nge. During the bootstrap, how the data is streamed?=0AFrom what I observed= , Node 3 has replicates for its primary range on node 4, 5. So it streams t= he data from them and starts to compact them. Also, node 3 holds replicates= for primary range of node 2, so it streams data from node 2 and node 4. Si= milarly, it holds replicates for node 1. So data streamed from node 1 and n= ode 2. So during the bootstaping, it basically gets the data from all the r= eplicates (2 copies each), so it will require double the disk space in orde= r to hold the data? Over the time, those SStables will be compacted and red= undant will be removed? Is it true?=0A=0Aif we issue nodetool repair -pr on= node 3, apart from streaming data from node 4, 5 to 3. We also see data st= ream between node 4, 5 since they hold the replicates. But I don't see log = regarding "merkle tree calculation" on node 4,5. Just wondering how they kn= ow what data to stream in order to repair node 4, 5?=0A=0AThanks.=0A-Wei --967773369-1829278262-1359663557=:47474 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
I decided to dig in t= o the source code, looks like in the case of nodetool repair, if the curren= t node sees the difference between the remote nodes based on the merkle tre= e calculation, it will start a streamrepair session to ask the remote nodes= to stream data between  each other. 

=
But I am still not sure how about the my first question regarding= the bootstrap, anyone?

Thanks.
-Wei

From: Wei Zhu <wz1975@yahoo.com>
To: Cassandr usergroup <user@cassandr= a.apache.org>
Sent: Thursday, January 31, 2013 10:50 AM
Subject: General question regarding bootstrap and nodeto= ol repair

=0A
Hi,
After messing= around with my Cassandra cluster recently, I think I need some basic under= standing on how things work behind scene regarding data streaming.
Le= t's say we have three node cluster with RF =3D 3.  If node 3 for some = reason dies and I want to replace it with a new node with the same (maybe m= inus one) range. During the bootstrap, how the data is streamed?
From= what I observed, Node 3 has replicates for its primary range on node 4, 5.= So it streams the data from them and starts to compact them. Also, node 3 holds=0A replicates for primary range= of node 2, so it streams data from node 2 and node 4. Similarly, it holds = replicates for node 1. So data streamed from node 1 and node 2. So during t= he bootstaping, it basically gets the data from all the replicates (2 copie= s each), so it will require double the disk space in order to hold the data= ? Over the time, those SStables will be compacted and redundant will be rem= oved? Is it true?

if we issue nodetool repair -pr on node 3, = apart from streaming data from node 4, 5 to 3. We also see data stream betw= een node 4, 5 since they hold the replicates. But I don't see log regarding= "merkle tree calculation" on node 4,5. Just wondering how they know what d= ata to stream in order to repair node 4,=0A 5?

Thanks.
-Wei



--967773369-1829278262-1359663557=:47474--