Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates
 209.85.160.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHn9q9PxD=GaSL7eHJ8BZrCPkzZmfQkoAhwoHdgxjsZKp6+-Hw@mail.gmail.com>
References: <62E5A0D8E5144EE9AE985074A6B035C6@ntoklo.com>
	<CAHn9q9PxD=GaSL7eHJ8BZrCPkzZmfQkoAhwoHdgxjsZKp6+-Hw@mail.gmail.com>
Date: Mon, 2 Jul 2012 14:21:54 +0200
Message-ID: 
 <CAN3fqkxv9OAx7uHyQ2g22pme4wYLVC60joxsBiuJMk5CeBvksg@mail.gmail.com>
Subject: =?windows-1252?Q?Re=3A_Nodes_marked_dead=85=2E_leap_second=3F?=
From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= <skrolle@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636d34591ee4ebd04c3d7d722

--001636d34591ee4ebd04c3d7d722
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Bug: https://lkml.org/lkml/2012/6/30/122

Simple fix to reset the leap second flag: date; date `date
+"%m%d%H%M%C%y.%S"`; date;


/Henrik

On Mon, Jul 2, 2012 at 1:56 PM, Jean Paul Adant
<jean.paul.adant@gmail.com>wrote:

> Hi,
>
> I did have the same problem with cassandra 1.1.1 on Ubuntu 11.10
> I had to reboot all nodes
> I'm interested in any information about this.
>
> Thanks
>
> Jean Paul
>
> 2012/7/2 Filippo Diotalevi <filippo@ntoklo.com>
>
>>  Hi,
>> we had some really weird issues during the weekend, with our cassandra
>> nodes starting marking as dead other (working) nodes in the cluster. Tha=
t
>> happened all Sunday, and it's still happening. Node are marked dead and =
up
>> all the time=85.
>>
>> Some example logs:
>>
>> INFO [GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818)
>> InetAddress /xx.xx.xx.233 is now dead.
>> INFO [GossipTasks:1] 2012-07-02 06:55:01,805 Gossiper.java (line 818)
>> InetAddress /xx.xx.xx.235 is now dead.
>> INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804)
>> InetAddress /xx.xx.xx.233 is now UP
>> INFO [GossipStage:1] 2012-07-02 06:55:21,893 Gossiper.java (line 804)
>> InetAddress /xx.xx.xx.235 is now UP
>> INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818)
>> InetAddress /xx.xx.xx.235 is now dead.
>> INFO [GossipTasks:1] 2012-07-02 06:57:58,537 Gossiper.java (line 818)
>> InetAddress /xx.xx.xx.233 is now dead.
>> INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804)
>> InetAddress /xx.xx.xx.233 is now UP
>>
>>
>> I couldn't find any real exception in the logs, but I noticed that the
>> first error occurred at
>>  INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 818)
>> InetAddress /xx.xx.xx.234 is now dead.
>>
>> 2012-07-01 02:00:31,169, in the German timezone were the machine is
>> hosted, is June 30th 23:59:60 UTC, the leap second that caused quite a f=
ew
>> issues this weekend.
>>
>> Can it be the cause of the cluster failure? Has anybody noticed similar
>> issues? ( also see
>> https://twitter.com/redditstatus/status/219244389044731904 )
>>
>> I'm running Ubuntu 10.04.3 LTS.
>>
>> Many thanks,
>> --
>> Filippo Diotalevi
>>
>>
>
>
> --
> -----------------------------------------------------
> Jean Paul Adant - Cr=E9ative-Ing=E9nierie
> jean.paul.adant@gmail.com
>
>
>
>

--001636d34591ee4ebd04c3d7d722
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Bug: <a href=3D"https://lkml.org/lkml/2012/6/30/122" target=3D"_blank">http=
s://lkml.org/lkml/2012/6/30/122</a><br><br>Simple fix to reset the leap sec=
ond flag: date; date `date +&quot;%m%d%H%M%C%y.%S&quot;`; date;<br><br><br>=
/Henrik<br>
<br><div class=3D"gmail_quote">On Mon, Jul 2, 2012 at 1:56 PM, Jean Paul Ad=
ant <span dir=3D"ltr">&lt;<a href=3D"mailto:jean.paul.adant@gmail.com" targ=
et=3D"_blank">jean.paul.adant@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<div><br></div><div>I did have the same p=
roblem with cassandra 1.1.1 on Ubuntu 11.10</div><div>I had to reboot all n=
odes</div>

<div>I&#39;m interested in any information about this.</div><div><br></div>=
<div>Thanks</div><div>
<br></div><div>Jean Paul<br><div><div><div><br><div class=3D"gmail_quote">2=
012/7/2 Filippo Diotalevi <span dir=3D"ltr">&lt;<a href=3D"mailto:filippo@n=
toklo.com" target=3D"_blank">filippo@ntoklo.com</a>&gt;</span><br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">


                <div>
                    Hi,
                </div><div>we had some really weird issues during the weeke=
nd, with our cassandra nodes starting marking as dead other (working) nodes=
 in the cluster. That happened all Sunday, and it&#39;s still happening. No=
de are marked dead and up all the time=85.</div>


<div><br></div><div>Some example logs:</div><div><br></div><div><div>INFO [=
GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818) InetAddress=
 /xx.xx.xx.233 is now dead.</div><div>INFO [GossipTasks:1] 2012-07-02 06:55=
:01,805 Gossiper.java (line 818) InetAddress /xx.xx.xx.235 is now dead.</di=
v>


<div>INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804) =
InetAddress /xx.xx.xx.233 is now UP</div><div>INFO [GossipStage:1] 2012-07-=
02 06:55:21,893 Gossiper.java (line 804) InetAddress /xx.xx.xx.235 is now U=
P</div>


<div>INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818) =
InetAddress /xx.xx.xx.235 is now dead.</div><div>INFO [GossipTasks:1] 2012-=
07-02 06:57:58,537 Gossiper.java (line 818) InetAddress /xx.xx.xx.233 is no=
w dead.</div>


<div>INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804) =
InetAddress /xx.xx.xx.233 is now UP</div></div><div><br></div><div><br></di=
v><div>I couldn&#39;t find any real exception in the logs, but I noticed th=
at the first error occurred at=A0</div>


<div>=A0INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 81=
8) InetAddress /xx.xx.xx.234 is now dead.</div><div><br></div><div>2012-07-=
01 02:00:31,169, in the German timezone were the machine is hosted, is June=
 30th 23:59:60 UTC, the leap second that caused quite a few issues this wee=
kend.=A0</div>


<div><br></div><div>Can it be the cause of the cluster failure? Has anybody=
 noticed similar issues? ( also see=A0<a href=3D"https://twitter.com/reddit=
status/status/219244389044731904" target=3D"_blank">https://twitter.com/red=
ditstatus/status/219244389044731904</a> )</div>


<div><br></div><div>I&#39;m running=A0Ubuntu 10.04.3 LTS.</div>
                <div><div><br></div><div>Many thanks,</div><span><font colo=
r=3D"#888888"><div>--=A0</div><div>Filippo Diotalevi</div><div><br></div></=
font></span></div>
            </blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span><font color=3D"#888888">-- <br><div>----------------------------=
-------------------------</div>Jean Paul Adant -=A0Cr=E9ative-Ing=E9nierie<=
div>
<a href=3D"mailto:jean.paul.adant@gmail.com" target=3D"_blank">jean.paul.ad=
ant@gmail.com</a><br>
<div><br></div><div><br></div></div><br>
</font></span></div></div>
</blockquote></div><br>

--001636d34591ee4ebd04c3d7d722--