Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates
 207.188.23.6 as permitted sender)
From: Todd Burruss <bburruss@real.com>
To: Todd Burruss <bburruss@real.com>,
  "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Sun, 21 Mar 2010 11:30:50 -0700
Subject: RE: node repair
Thread-Topic: node repair
Thread-Index: AcrIWplxySvemChjT/GF21oxHKne4gACbkeuAA0OXOEAIwVpCA==
Message-ID: <766B5A29D28DA442AB229AAEE2AFC44507DF67ABBC@SEAMBX.corp.real.com>
References: 
 <766B5A29D28DA442AB229AAEE2AFC44507DF67ABB8@SEAMBX.corp.real.com>,<e06563881003201123v2b844e17xdbcd87f312f53fe6@mail.gmail.com>,<766B5A29D28DA442AB229AAEE2AFC44507DF67ABB9@SEAMBX.corp.real.com>,<766B5A29D28DA442AB229AAEE2AFC44507DF67ABBA@SEAMBX.corp.real.com>
In-Reply-To: <766B5A29D28DA442AB229AAEE2AFC44507DF67ABBA@SEAMBX.corp.real.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

one last comment about thesting this is i stopped all the servers, wiped th=
eir data and restarted.  allowed each node to get about 15gb on them, then =
repeated the test.  the nodetool repair does not repair the crashed node.

the only minorly interesting thing about my cluster is that i use random pa=
rtitioner and assigned a token to each node.

________________________________________
From: Todd Burruss
Sent: Saturday, March 20, 2010 6:48 PM
To: Todd Burruss; user@cassandra.apache.org
Subject: RE: node repair

fyi ... i just compacted and node 105 is definitely not being repaired
________________________________________
From: Todd Burruss
Sent: Saturday, March 20, 2010 12:34 PM
To: user@cassandra.apache.org
Subject: RE: node repair

same IP, same token.  i'm trying Handling Failure, #3.

it is running, a part of the ring, and seems to be handling reads/writes, b=
ut does not appear to have received a copy of its data (the last node below=
).  i've searched the all logs for ERRORs but there are none.  i will compa=
ct the other nodes, but i don't think it will make a difference.

[bburruss@kv-app05 ~]$ ~/cassandra/bin/nodetool -h localhost -p 9000 ring
Address       Status     Load          Range                               =
       Ring
                                       170141183460469231731687303715884105=
728
192.168.132.102Up         130.22 GB     42535295865117307932921825928971026=
431     |<--|
192.168.132.103Up         131.03 GB     85070591730234615865843651857942052=
863     |   |
192.168.132.104Up         125.7 GB      12760588759535192379876547778691307=
9295    |   |
192.168.132.105Up         65.62 GB      17014118346046923173168730371588410=
5728    |-->|


________________________________________
From: Jonathan Ellis [jbellis@gmail.com]
Sent: Saturday, March 20, 2010 11:23 AM
To: user@cassandra.apache.org
Subject: Re: node repair

if you bring up a new node w/ a different ip but the same token, it
will confuse things.

http://wiki.apache.org/cassandra/Operations "handling failure" section
covers best practices here.

On Sat, Mar 20, 2010 at 11:51 AM, Todd Burruss <bburruss@real.com> wrote:
> i had a node fail, lost all data.  so i brought it back up fresh, but ass=
igned it the same token in storage-conf.xml.  then ran nodetool repair.
>
> all compactions have finished, no streams are happening.  nothing.  so i =
did it again.  same thing.  i don't think its working.  is there a log mess=
age i can search for?  INFO is my log level.  i could try it again with deb=
ug i suppose.
>
> thx