Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 76705 invoked from network); 1 Apr 2010 19:23:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Apr 2010 19:23:07 -0000 Received: (qmail 86255 invoked by uid 500); 1 Apr 2010 19:23:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86235 invoked by uid 500); 1 Apr 2010 19:23:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86227 invoked by uid 99); 1 Apr 2010 19:23:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 19:23:06 +0000 X-ASF-Spam-Status: No, hits=0.3 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan.dispaltro@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pw0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 19:23:00 +0000 Received: by pwi10 with SMTP id 10so1347405pwi.31 for ; Thu, 01 Apr 2010 12:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:received:message-id:subject:to:content-type; bh=HN3Rc+CEpvoNnb3q6LiR30L5NgOezPEbZVi9IbCrOAk=; b=NYzAxtb2j6pcfSRWNvceH3+9KOue/7VHOuC8iHYqWh01Tw9OLdaM90wGYi16MC35y5 yU94+DqovWIEouA9QRta28HnVibBJGGL7maEccJZUXrCFDg2hsndsYHHmJyQqjbSCzAo C/DmxldU5c1UpqPgOEt6Ed8C59wLfNrdm4os4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=sZr3AuUc7oCAdGr3BoDR4tXOJFgueNKd9gVbxFYs9ACbRHk29FeoXCFceyFVWuZPt0 5GGotIccABfcDt6hPhKpnOmjylt6b00c0jzMh7qHYVcTJ65UjJDtPIalehPljUw50htA XJmG2iMnYa8bMDSQzqkEhOZwUqORPXHxgUJMk= MIME-Version: 1.0 Received: by 10.141.1.8 with HTTP; Thu, 1 Apr 2010 12:22:20 -0700 (PDT) In-Reply-To: References: From: Dan Di Spaltro Date: Thu, 1 Apr 2010 12:22:20 -0700 Received: by 10.141.108.11 with SMTP id k11mr966516rvm.132.1270149760087; Thu, 01 Apr 2010 12:22:40 -0700 (PDT) Message-ID: Subject: Re: Stalled Bootstrapping Process To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd13b7842f600048331c9be --000e0cd13b7842f600048331c9be Content-Type: text/plain; charset=ISO-8859-1 But I didn't restart the red one. On Thu, Apr 1, 2010 at 12:18 PM, Jonathan Ellis wrote: > There shouldn't be anything to clean up. (The temporary streaming > files it anticompacted are automatically removed on restart) > > On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro > wrote: > > Okay, so should I run any more commands like cleanup before? > > > > On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis > wrote: > >> > >> Bootstrap source restarting will always fail bootstrap. You'll need > >> to restart the blue one too now, I'm afraid. > >> > >> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro > > >> wrote: > >> > Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has > 0 > >> > in > >> > STREAM-STAGE. > >> > > >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro > >> > > >> > wrote: > >> >> > >> >> Red one. > >> >> Gary - both say nothing is happening with no destinations or sources. > >> >> > >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis > >> >> wrote: > >> >>> > >> >>> which node rebooted, the red one, or the blue one? > >> >>> > >> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro > >> >>> > >> >>> wrote: > >> >>> > So we are adding another node to the cluster with the latest 0.6 > >> >>> > branch > >> >>> > (RC1). It seems to be hung in some limbo state. > >> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly > >> >>> > across 4 > >> >>> > machines, with RF=3. One machine had more load than the others, > >> >>> > and > >> >>> > sure > >> >>> > enough bootstrapping selected that node. That is the red > machine. > >> >>> > The > >> >>> > light blue machine is the new machine. > >> >>> > I have attached a graph to illustrate when the bootstrap process > >> >>> > started. > >> >>> > In jconsole the streamingservice status was "performing > >> >>> > anticompaction..." > >> >>> > for over 18-24 hrs. It is currently in "nothing is happening". > It > >> >>> > did > >> >>> > have 1 active STREAM-STAGE task, but the machine had to be > rebooted > >> >>> > for > >> >>> > something unrelated to cassandra. Now the light blue machine > appears > >> >>> > to > >> >>> > be > >> >>> > getting data, but its growing at virtually the same rate as the > >> >>> > other > >> >>> > machines which makes me think it is part of the cluster and not > >> >>> > actually > >> >>> > streaming data from the machine its supposed to. > >> >>> > Any other ideas on how to debug? > >> >>> > > >> >>> > -- > >> >>> > Dan Di Spaltro > >> >>> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Dan Di Spaltro > >> > > >> > > >> > > >> > -- > >> > Dan Di Spaltro > >> > > > > > > > > > -- > > Dan Di Spaltro > > > -- Dan Di Spaltro --000e0cd13b7842f600048331c9be Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable But I didn't restart the red one.

On = Thu, Apr 1, 2010 at 12:18 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
There shouldn't be anything to clean up. =A0(The temporary streaming files it anticompacted are automatically removed on restart)

On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro <dan.dispaltro@gmail.com> wrote:
> Okay, so should I run any more commands like cleanup before?
>
> On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> Bootstrap source restarting will always fail bootstrap. =A0You'= ;ll need
>> to restart the blue one too now, I'm afraid.
>>
>> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro <dan.dispaltro@gmail.com>
>> wrote:
>> > Before the Red one rebooted it had 1 active=A0STREAM-STAGE. = =A0Now it has 0
>> > in
>> > STREAM-STAGE.
>> >
>> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro
>> > <dan.dispaltro@= gmail.com>
>> > wrote:
>> >>
>> >> Red one.
>> >> Gary - both say nothing is happening with no destinations= or sources.
>> >>
>> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis <jbellis@gmail.com>
>> >> wrote:
>> >>>
>> >>> which node rebooted, the red one, or the blue one? >> >>>
>> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro
>> >>> <dan.di= spaltro@gmail.com>
>> >>> wrote:
>> >>> > So we are adding another node to the cluster wit= h the latest 0.6
>> >>> > branch
>> >>> > (RC1). =A0It seems to be hung in some limbo stat= e.
>> >>> > Before bootstrapping our cluster had 50-60GB spr= ead fairly evenly
>> >>> > across 4
>> >>> > machines, with RF=3D3. =A0 One machine had more = load than the others,
>> >>> > and
>> >>> > sure
>> >>> > enough bootstrapping selected that node. =A0 Tha= t is the red machine.
>> >>> > =A0The
>> >>> > light blue machine is the new machine.
>> >>> > I have attached a graph to illustrate when the b= ootstrap process
>> >>> > started.
>> >>> > In jconsole the streamingservice status was &quo= t;performing
>> >>> > anticompaction..."
>> >>> > for over 18-24 hrs. =A0It is currently in "= nothing is happening". =A0 It
>> >>> > did
>> >>> > have 1 active STREAM-STAGE task, but the machine= had to be rebooted
>> >>> > for
>> >>> > something unrelated to cassandra. Now the light = blue machine appears
>> >>> > to
>> >>> > be
>> >>> > getting data, but its growing at virtually the s= ame rate as the
>> >>> > other
>> >>> > machines which makes me think it is part of the = cluster and not
>> >>> > actually
>> >>> > streaming data from the machine its supposed to.=
>> >>> > Any other ideas on how to debug?
>> >>> >
>> >>> > --
>> >>> > Dan Di Spaltro
>> >>> >
>> >>
>> >>
>> >>
>> >> --
>> >> Dan Di Spaltro
>> >
>> >
>> >
>> > --
>> > Dan Di Spaltro
>> >
>
>
>
> --
> Dan Di Spaltro
>



--
Dan Di Spal= tro
--000e0cd13b7842f600048331c9be--