Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 29227 invoked from network); 18 Jul 2010 17:11:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Jul 2010 17:11:14 -0000 Received: (qmail 65352 invoked by uid 500); 18 Jul 2010 17:11:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 65338 invoked by uid 500); 18 Jul 2010 17:11:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 65330 invoked by uid 99); 18 Jul 2010 17:11:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Jul 2010 17:11:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shimi.k@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Jul 2010 17:11:06 +0000 Received: by fxm1 with SMTP id 1so2009954fxm.31 for ; Sun, 18 Jul 2010 10:09:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=+msPFTb7k4Jb/1y9cFzj9PZle94WE03NaM+IpJDH2pg=; b=Khi0XUu1A7xMVM39Fk2XMlKY4dwqqI4y+1BMkgNozcOKLyiGrq+pVzRhTN5SsoIOol xb+33xfcLCykm34Ebe9hnBowgX2QoIePs1MCgpxi3FtAHbT+OWz0SduSOJTCjLoqs65R JoNK8RAxhi8bBP/RrvYAPl9c9Z2/41R5loxTk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=toUBSj7LwA++HmEdljN8AflI0OBo6CSpcl1lcsxft6dMRJZUiuTOZGieN6oJexkfEX 77yczZXY3rRjSwN+ruPiu9emptKiZA7MdmBDC4C3aqod4eATLKkkBGc1PylUNK8n97EP GzY+dkO3mSK+VcwJixhBQKxfcYPbkceI5DGJU= MIME-Version: 1.0 Received: by 10.223.125.196 with SMTP id z4mr2724151far.80.1279472985578; Sun, 18 Jul 2010 10:09:45 -0700 (PDT) Received: by 10.223.103.130 with HTTP; Sun, 18 Jul 2010 10:09:45 -0700 (PDT) In-Reply-To: <20100717212113.GC79210@alumni.caltech.edu> References: <20100714225847.GA64220@alumni.caltech.edu> <20100715202806.GB71234@alumni.caltech.edu> <20100716054508.GA73522@alumni.caltech.edu> <20100717212113.GC79210@alumni.caltech.edu> Date: Sun, 18 Jul 2010 20:09:45 +0300 Message-ID: Subject: Re: Bootstrap question From: shimi To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c5c241ce1d0f048bac84cb X-Virus-Checked: Checked by ClamAV on apache.org --001636c5c241ce1d0f048bac84cb Content-Type: text/plain; charset=ISO-8859-1 If I have problems with never ending bootstraping I do the following. I try each one if it doesn't help I try the next. It might not be the right thing to do but it worked for me. 1. Restart the bootstraping node 2. If I see streaming 0/xxxx I restart the node and all the streaming nodes 3. Restart all the nodes 4. If there is data in the bootstraing node I delete it before I restart. Good luck Shimi On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro < anthonym@alumni.caltech.edu> wrote: > So still waiting for any sort of answer on this one. The cluster still > refuses to do anything when I bring up new nodes. I shut down all the > new nodes and am waiting. I'm guessing that maybe the old nodes have > some state which needs to get cleared out? Is there anything I can do > at this point? Are there alternate strategies for bootstrapping I can > try? (For instance can I just scp all the sstables to all the new > nodes and do a repair, would that actually work?). > > Anyone seen this sort of issue? All this is with 0.6.3 so I assume > eventually others will see this issue. > > -Anthony > > On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote: > > Okay, so things were pretty messed up. I shut down all the new nodes, > > then the old nodes started doing the half the ring is down garbage which > > pretty much requires a full restart of everything. So I had to shut > > everything down, then bring the seed back, then the rest of the nodes, > > so they finally all agreed on the ring again. > > > > Then I started one of the new nodes, and have been watching the logs, so > > far 2 hours since the "Bootstrapping" message appeared in the new > > log and nothing has happened. No anticompaction messages anywhere, > there's > > one node compacting, but its on the other end of the ring, so no where > near > > that new node. I'm wondering if it will ever get data at this point. > > > > Is there something else I should try? The only thing I can think of > > is deleting the system directory on the new node, and restarting, so > > I'll try that and see if it does anything. > > > > -Anthony > > > > On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote: > > > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro > > > wrote: > > > > Is the fact that 2 new nodes are in the range messing it up? > > > > > > Probably. > > > > > > > And if so > > > > how do I recover (I'm thinking, shutdown new nodes 2,3,4,5, the > bringing > > > > up nodes 2,4, waiting for them to finish, then bringing up 3,5?). > > > > > > Yes. > > > > > > You might have to restart the old nodes too to clear out the confusion. > > > > > > -- > > > Jonathan Ellis > > > Project Chair, Apache Cassandra > > > co-founder of Riptano, the source for professional Cassandra support > > > http://riptano.com > > > > -- > > ------------------------------------------------------------------------ > > Anthony Molinaro > > -- > ------------------------------------------------------------------------ > Anthony Molinaro > --001636c5c241ce1d0f048bac84cb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
If I have problems with never ending bootstraping I do the= following. I try each one if it doesn't help I try the next. It might = not be the right thing to do but it worked for me.

1. Restart the bo= otstraping node
2. If I see streaming 0/xxxx I restart the node and all the streaming nodes=
3. Restart all the nodes
4. If there is data in the bootstraing node= I delete it before I restart.

Good luck
Shimi

On Sun, Jul 18, 2010 at 12:21 AM, Anthony Molinaro <anthonym@alumni.caltech.edu> wrote:
So still waiting for any sort of answer on this one. =A0The cluster still refuses to do anything when I bring up new nodes. =A0I shut down all the new nodes and am waiting. =A0I'm guessing that maybe the old nodes have=
some state which needs to get cleared out? =A0Is there anything I can do at this point? =A0Are there alternate strategies for bootstrapping I can try? =A0(For instance can I just scp all the sstables to all the new
nodes and do a repair, would that actually work?).

Anyone seen this sort of issue? =A0All this is with 0.6.3 so I assume
eventually others will see this issue.

-Anthony

On Thu, Jul 15, 2010 at 10:45:08PM -0700, Anthony Molinaro wrote:
> Okay, so things were pretty messed up. =A0I shut down all the new node= s,
> then the old nodes started doing the half the ring is down garbage whi= ch
> pretty much requires a full restart of everything. =A0So I had to shut=
> everything down, then bring the seed back, then the rest of the nodes,=
> so they finally all agreed on the ring again.
>
> Then I started one of the new nodes, and have been watching the logs, = so
> far 2 hours since the "Bootstrapping" message appeared in th= e new
> log and nothing has happened. =A0No anticompaction messages anywhere, = there's
> one node compacting, but its on the other end of the ring, so no where= near
> that new node. =A0I'm wondering if it will ever get data at this p= oint.
>
> Is there something else I should try? =A0The only thing I can think of=
> is deleting the system directory on the new node, and restarting, so > I'll try that and see if it does anything.
>
> -Anthony
>
> On Thu, Jul 15, 2010 at 03:43:49PM -0500, Jonathan Ellis wrote:
> > On Thu, Jul 15, 2010 at 3:28 PM, Anthony Molinaro
> > <anthonym@alumn= i.caltech.edu> wrote:
> > > Is the fact that 2 new nodes are in the range messing it up?=
> >
> > Probably.
> >
> > > =A0And if so
> > > how do I recover (I'm thinking, shutdown new nodes 2,3,4= ,5, the bringing
> > > up nodes 2,4, waiting for them to finish, then bringing up 3= ,5?).
> >
> > Yes.
> >
> > You might have to restart the old nodes too to clear out the conf= usion.
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra supp= ort
> > http://riptano.c= om
>
> --
> ----------------------------------------------------------------------= --
> Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 &= lt;anthonym@alumni.caltech.e= du>

--
------------------------------------------------------------------------ Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <anthonym@alumni.caltech.edu>

--001636c5c241ce1d0f048bac84cb--