From user-return-4014-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Apr 07 14:40:37 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 33385 invoked from network); 7 Apr 2010 14:40:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Apr 2010 14:40:37 -0000 Received: (qmail 95445 invoked by uid 500); 7 Apr 2010 14:40:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95428 invoked by uid 500); 7 Apr 2010 14:40:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95420 invoked by uid 99); 7 Apr 2010 14:40:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 14:40:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of MJones@imagehawk.com designates 67.63.148.114 as permitted sender) Received: from [67.63.148.114] (HELO ihedge.imagehawk.com) (67.63.148.114) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 14:40:29 +0000 Received: from ihcomm.ImageHawk.local (192.168.21.2) by mail.imagehawk.com (192.168.21.3) with Microsoft SMTP Server (TLS) id 8.0.813.0; Wed, 7 Apr 2010 09:40:51 -0500 Received: from ihcomm.ImageHawk.local ([192.168.20.2]) by ihcomm.ImageHawk.local ([192.168.20.2]) with mapi; Wed, 7 Apr 2010 09:37:12 -0500 From: Mark Jones To: "user@cassandra.apache.org" Date: Wed, 7 Apr 2010 09:39:46 -0500 Subject: RE: What is loadbalance supposed to do? 0.6.0RC1 Thread-Topic: What is loadbalance supposed to do? 0.6.0RC1 Thread-Index: AcrWW1unY5iOjKg4To2x1VPT7BDDAQAAHzxw Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org The log said Bootstrapping @ 07:34 (since it was 08:35, I assumed it wasn= 't doing anything, also, CPU usage was < 10%) Turns out, when I restarted the node, it claimed the time was 7:35 rather t= han 8:35. Why would log4j be off by one hour? We are on CDT here, and hav= e been for more than a week. The date command returns the appropriate time= (Wed Apr 7 09:24:50 CDT 2010), I see no evidence of a TZ variable and /et= c/timezone shows "America/Chicago" If it was off by 6 hours instead of 1, I could understand this, but its onl= y off by one hour. System.getProperties() reports the timezone as blank Also, if the data is pushed out to the other nodes before the bootstrapping= , why has data been lost? Does this mean that decommissioning a node resul= ts in data loss? -----Original Message----- From: Sylvain Lebresne [mailto:sylvain@yakaz.com] Sent: Wednesday, April 07, 2010 9:07 AM To: user@cassandra.apache.org Subject: Re: What is loadbalance supposed to do? 0.6.0RC1 > It shouldn't remove a node from the ring should it? (appears it did) It does. As explained here: http://wiki.apache.org/cassandra/Operations, loadbalance 'decomission' the node and then add it back as a bootstrapping node (roughly). So that the node disappear is expected and it is supposed to come back. But this is not a quick operation (and certainely not one you want to do ev= ery other day). You apparently restarted Cassandra while it was doing its stuff= . Not sure the loss of data is to be expected though. > It shouldn't remove data from db, should it? (data size appears to grow,= but records are now missing) > > Loaded 38 million "rows" and the ring looked like this: > > mark@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192= .168.1.116 ring > Address Status Load Range = Ring > 167730615856220406399741259265091= 647472 > 192.168.1.116 Up 4.81 GB 548807629185910207759628439658397= 61529 |<--| > 192.168.1.119 Up 12.96 GB 160455137948102479104219052453775= 170160 | | > 192.168.1.12 Up 8.98 GB 167730615856220406399741259265091= 647472 |-- > > So I did this: > mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host = 192.168.1.12 loadbalance > > And this happened (even though Cassandra was still running): > > mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host = 192.168.1.12 ring > Address Status Load Range = Ring > 160455137948102479104219052453775= 170160 > 192.168.1.116 Up 12.71 GB 548807629185910207759628439658397= 61529 |<--| > 192.168.1.119 Up 13.47 GB 160455137948102479104219052453775= 170160 |-->| > > After restarting Cassandra on .12 > > mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host = 192.168.1.12 ring > Address Status Load Range = Ring > 160455137948102479104219052453775= 170160 > 192.168.1.116 Up 12.71 GB 548807629185910207759628439658397= 61529 |<--| > 192.168.1.12 Up 8.98 GB 107669873051407416105654071439122= 680093 | | > 192.168.1.119 Up 13.47 GB 160455137948102479104219052453775= 170160 |-->| > > Now I have more data, but nearly 50% of my queries are failing (not found= ). This data was checked before the load balance was done. >