Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40496 invoked from network); 20 Jul 2010 02:38:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Jul 2010 02:38:12 -0000 Received: (qmail 96241 invoked by uid 500); 20 Jul 2010 02:38:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96152 invoked by uid 500); 20 Jul 2010 02:38:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96144 invoked by uid 99); 20 Jul 2010 02:38:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 02:38:09 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 02:38:03 +0000 Received: by iwn38 with SMTP id 38so5844780iwn.31 for ; Mon, 19 Jul 2010 19:36:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=conLHt+MQz5SPX8sBQf4/m1tyMypT2bTLhqJqMBpnaQ=; b=Jtcc7C7g7bk7ouYg0uSJo+JEI6bhdvhXO92PChI+Ar1tZxBTEOJJcxg+wwA2IHn1G0 5ymAMDbKbPwlH9ykeQhbKE1kwEDHiNKoQR4GKOlttTjswXFjKD2fNa+jspyHbDjFuQY0 CoDDgkN2wUkV2lbf31QeN7p0UsxdIviCADsZ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=mLgWPa1YG2kksql9ulkwYB/1uU+JHQ5WGlWjdY6124Ts6Z2kcjuZhlZrAb4pQ+ZkO2 HYXjiO1TcGU9ox5Ze6I8v2/uG+8vBWagzyIGQd8J4F0kXHYy4C2jhlhY0wlZ5Y4YIn1u q065IvA5ryoE+gGVAzNdmpfJ5vhGjInIvKo7s= Received: by 10.231.146.136 with SMTP id h8mr6587450ibv.0.1279593402333; Mon, 19 Jul 2010 19:36:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.194.77 with HTTP; Mon, 19 Jul 2010 19:36:22 -0700 (PDT) In-Reply-To: <20100718190127.GE79210@alumni.caltech.edu> References: <20100714225847.GA64220@alumni.caltech.edu> <20100715202806.GB71234@alumni.caltech.edu> <20100716054508.GA73522@alumni.caltech.edu> <20100717212113.GC79210@alumni.caltech.edu> <20100718190127.GE79210@alumni.caltech.edu> From: Jonathan Ellis Date: Mon, 19 Jul 2010 21:36:22 -0500 Message-ID: Subject: Re: Bootstrap question To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org What gets logged on the old nodes at debug, when you try to add a single new machine after a full cluster restart? Removing Location would blow away the nodes' token information... It should be safe if you set the InitialToken to what it used to be on each machine before bringing it up after nuking those. Better snapshot the system keyspace first, just in case. On Sun, Jul 18, 2010 at 2:01 PM, Anthony Molinaro wrote: > Yeah, I tried all that already and it didn't seem to work, no new nodes > will bootstrap, which makes me think there's some saved state somewhere, > preventing a new node from bootstrapping. =A0I think maybe the Location > sstables? =A0Is it safe to nuke those on all hosts and restart everything= ? > (I just don't want to lose actual data). > > Thanks for the ideas, > > -Anthony > > On Sun, Jul 18, 2010 at 08:09:45PM +0300, shimi wrote: >> If I have problems with never ending bootstraping I do the following. I = try >> each one if it doesn't help I try the next. It might not be the right th= ing >> to do but it worked for me. >> >> 1. Restart the bootstraping node >> 2. If I see streaming 0/xxxx I restart the node and all the streaming no= des >> 3. Restart all the nodes >> 4. If there is data in the bootstraing node I delete it before I restart= . >> >> Good luck >> Shimi >> >> On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro < >> anthonym@alumni.caltech.edu> wrote: >> >> > So still waiting for any sort of answer on this one. =A0The cluster st= ill >> > refuses to do anything when I bring up new nodes. =A0I shut down all t= he >> > new nodes and am waiting. =A0I'm guessing that maybe the old nodes hav= e >> > some state which needs to get cleared out? =A0Is there anything I can = do >> > at this point? =A0Are there alternate strategies for bootstrapping I c= an >> > try? =A0(For instance can I just scp all the sstables to all the new >> > nodes and do a repair, would that actually work?). >> > >> > Anyone seen this sort of issue? =A0All this is with 0.6.3 so I assume >> > eventually others will see this issue. >> > >> > -Anthony >> > >> > On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote: >> > > Okay, so things were pretty messed up. =A0I shut down all the new no= des, >> > > then the old nodes started doing the half the ring is down garbage w= hich >> > > pretty much requires a full restart of everything. =A0So I had to sh= ut >> > > everything down, then bring the seed back, then the rest of the node= s, >> > > so they finally all agreed on the ring again. >> > > >> > > Then I started one of the new nodes, and have been watching the logs= , so >> > > far 2 hours since the "Bootstrapping" message appeared in the new >> > > log and nothing has happened. =A0No anticompaction messages anywhere= , >> > there's >> > > one node compacting, but its on the other end of the ring, so no whe= re >> > near >> > > that new node. =A0I'm wondering if it will ever get data at this poi= nt. >> > > >> > > Is there something else I should try? =A0The only thing I can think = of >> > > is deleting the system directory on the new node, and restarting, so >> > > I'll try that and see if it does anything. >> > > >> > > -Anthony >> > > >> > > On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote: >> > > > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro >> > > > wrote: >> > > > > Is the fact that 2 new nodes are in the range messing it up? >> > > > >> > > > Probably. >> > > > >> > > > > =A0And if so >> > > > > how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the >> > bringing >> > > > > up nodes 2,4, waiting for them to finish, then bringing up 3,5?)= . >> > > > >> > > > Yes. >> > > > >> > > > You might have to restart the old nodes too to clear out the confu= sion. >> > > > >> > > > -- >> > > > Jonathan Ellis >> > > > Project Chair, Apache Cassandra >> > > > co-founder of Riptano, the source for professional Cassandra suppo= rt >> > > > http://riptano.com >> > > >> > > -- >> > > --------------------------------------------------------------------= ---- >> > > Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= >> > >> > -- >> > ----------------------------------------------------------------------= -- >> > Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <= anthonym@alumni.caltech.edu> >> > > > -- > ------------------------------------------------------------------------ > Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com