Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 90823 invoked from network); 21 Mar 2010 18:32:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Mar 2010 18:32:51 -0000 Received: (qmail 28156 invoked by uid 500); 21 Mar 2010 18:32:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28135 invoked by uid 500); 21 Mar 2010 18:32:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28127 invoked by uid 99); 21 Mar 2010 18:32:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Mar 2010 18:32:48 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates 207.188.23.6 as permitted sender) Received: from [207.188.23.6] (HELO jor-el.real.com) (207.188.23.6) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Mar 2010 18:32:40 +0000 Received: from seacas02.corp.real.com ([::ffff:192.168.139.57]) (TLS: TLSv1/SSLv3,128bits,AES128-SHA) by jor-el.real.com with esmtp; Sun, 21 Mar 2010 11:32:19 -0700 id 00094032.4BA66633.00006D92 Received: from seambx.corp.real.com ([fe80::2d15:fda7:b3b8:e268]) by seacas02.corp.real.com ([::1]) with mapi; Sun, 21 Mar 2010 11:32:19 -0700 From: Todd Burruss To: Todd Burruss , "user@cassandra.apache.org" Date: Sun, 21 Mar 2010 11:30:50 -0700 Subject: RE: node repair Thread-Topic: node repair Thread-Index: AcrIWplxySvemChjT/GF21oxHKne4gACbkeuAA0OXOEAIwVpCA== Message-ID: <766B5A29D28DA442AB229AAEE2AFC44507DF67ABBC@SEAMBX.corp.real.com> References: <766B5A29D28DA442AB229AAEE2AFC44507DF67ABB8@SEAMBX.corp.real.com>,,<766B5A29D28DA442AB229AAEE2AFC44507DF67ABB9@SEAMBX.corp.real.com>,<766B5A29D28DA442AB229AAEE2AFC44507DF67ABBA@SEAMBX.corp.real.com> In-Reply-To: <766B5A29D28DA442AB229AAEE2AFC44507DF67ABBA@SEAMBX.corp.real.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 one last comment about thesting this is i stopped all the servers, wiped th= eir data and restarted. allowed each node to get about 15gb on them, then = repeated the test. the nodetool repair does not repair the crashed node. the only minorly interesting thing about my cluster is that i use random pa= rtitioner and assigned a token to each node. ________________________________________ From: Todd Burruss Sent: Saturday, March 20, 2010 6:48 PM To: Todd Burruss; user@cassandra.apache.org Subject: RE: node repair fyi ... i just compacted and node 105 is definitely not being repaired ________________________________________ From: Todd Burruss Sent: Saturday, March 20, 2010 12:34 PM To: user@cassandra.apache.org Subject: RE: node repair same IP, same token. i'm trying Handling Failure, #3. it is running, a part of the ring, and seems to be handling reads/writes, b= ut does not appear to have received a copy of its data (the last node below= ). i've searched the all logs for ERRORs but there are none. i will compa= ct the other nodes, but i don't think it will make a difference. [bburruss@kv-app05 ~]$ ~/cassandra/bin/nodetool -h localhost -p 9000 ring Address Status Load Range = Ring 170141183460469231731687303715884105= 728 192.168.132.102Up 130.22 GB 42535295865117307932921825928971026= 431 |<--| 192.168.132.103Up 131.03 GB 85070591730234615865843651857942052= 863 | | 192.168.132.104Up 125.7 GB 12760588759535192379876547778691307= 9295 | | 192.168.132.105Up 65.62 GB 17014118346046923173168730371588410= 5728 |-->| ________________________________________ From: Jonathan Ellis [jbellis@gmail.com] Sent: Saturday, March 20, 2010 11:23 AM To: user@cassandra.apache.org Subject: Re: node repair if you bring up a new node w/ a different ip but the same token, it will confuse things. http://wiki.apache.org/cassandra/Operations "handling failure" section covers best practices here. On Sat, Mar 20, 2010 at 11:51 AM, Todd Burruss wrote: > i had a node fail, lost all data. so i brought it back up fresh, but ass= igned it the same token in storage-conf.xml. then ran nodetool repair. > > all compactions have finished, no streams are happening. nothing. so i = did it again. same thing. i don't think its working. is there a log mess= age i can search for? INFO is my log level. i could try it again with deb= ug i suppose. > > thx