Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48FA3D2F1 for ; Mon, 2 Jul 2012 12:22:25 +0000 (UTC) Received: (qmail 28537 invoked by uid 500); 2 Jul 2012 12:22:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28494 invoked by uid 500); 2 Jul 2012 12:22:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28466 invoked by uid 99); 2 Jul 2012 12:22:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 12:22:22 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gh0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 12:22:15 +0000 Received: by ghbg16 with SMTP id g16so4479373ghb.31 for ; Mon, 02 Jul 2012 05:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=aokVOqSTh+oWZ3lbAAYVjE1JYyeBiac3t0oAx+wAhMc=; b=Fs4jWbZYbg4iGChqKa2CDJ4ZWy243tV0lRdNfDdR2ZrJToZ3DIZJ5DdxshBDLpxtK1 A6Z5Sb0Tv/4rpKOw1VivwbKz9iMsnWnyE1Zeus1E52LxKiYl73V/ATJ1blGCX3M7GOTc vk6WlG8B2yibygNjss2gkYTusRT46VUDlUKIFh0npNtR+aUvedObVsrELUEfnN0pcvO7 QoOUJKK1ADbcnrdzcys6MPKlAUpvqwsrB1vE+zgZ/gE86wKwFVd5w7+VCVNZEaAtgxcO 5aIDztbfIwPF9HsSoGpaIqIf0bF1G8NSKULoaljULa1faxsrpVIypAalAJh0yIrLjW0A va1A== MIME-Version: 1.0 Received: by 10.100.206.2 with SMTP id d2mr4451107ang.74.1341231714946; Mon, 02 Jul 2012 05:21:54 -0700 (PDT) Received: by 10.101.183.5 with HTTP; Mon, 2 Jul 2012 05:21:54 -0700 (PDT) In-Reply-To: References: <62E5A0D8E5144EE9AE985074A6B035C6@ntoklo.com> Date: Mon, 2 Jul 2012 14:21:54 +0200 Message-ID: Subject: =?windows-1252?Q?Re=3A_Nodes_marked_dead=85=2E_leap_second=3F?= From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636d34591ee4ebd04c3d7d722 --001636d34591ee4ebd04c3d7d722 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bug: https://lkml.org/lkml/2012/6/30/122 Simple fix to reset the leap second flag: date; date `date +"%m%d%H%M%C%y.%S"`; date; /Henrik On Mon, Jul 2, 2012 at 1:56 PM, Jean Paul Adant wrote: > Hi, > > I did have the same problem with cassandra 1.1.1 on Ubuntu 11.10 > I had to reboot all nodes > I'm interested in any information about this. > > Thanks > > Jean Paul > > 2012/7/2 Filippo Diotalevi > >> Hi, >> we had some really weird issues during the weekend, with our cassandra >> nodes starting marking as dead other (working) nodes in the cluster. Tha= t >> happened all Sunday, and it's still happening. Node are marked dead and = up >> all the time=85. >> >> Some example logs: >> >> INFO [GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818) >> InetAddress /xx.xx.xx.233 is now dead. >> INFO [GossipTasks:1] 2012-07-02 06:55:01,805 Gossiper.java (line 818) >> InetAddress /xx.xx.xx.235 is now dead. >> INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804) >> InetAddress /xx.xx.xx.233 is now UP >> INFO [GossipStage:1] 2012-07-02 06:55:21,893 Gossiper.java (line 804) >> InetAddress /xx.xx.xx.235 is now UP >> INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818) >> InetAddress /xx.xx.xx.235 is now dead. >> INFO [GossipTasks:1] 2012-07-02 06:57:58,537 Gossiper.java (line 818) >> InetAddress /xx.xx.xx.233 is now dead. >> INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804) >> InetAddress /xx.xx.xx.233 is now UP >> >> >> I couldn't find any real exception in the logs, but I noticed that the >> first error occurred at >> INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 818) >> InetAddress /xx.xx.xx.234 is now dead. >> >> 2012-07-01 02:00:31,169, in the German timezone were the machine is >> hosted, is June 30th 23:59:60 UTC, the leap second that caused quite a f= ew >> issues this weekend. >> >> Can it be the cause of the cluster failure? Has anybody noticed similar >> issues? ( also see >> https://twitter.com/redditstatus/status/219244389044731904 ) >> >> I'm running Ubuntu 10.04.3 LTS. >> >> Many thanks, >> -- >> Filippo Diotalevi >> >> > > > -- > ----------------------------------------------------- > Jean Paul Adant - Cr=E9ative-Ing=E9nierie > jean.paul.adant@gmail.com > > > > --001636d34591ee4ebd04c3d7d722 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bug: http= s://lkml.org/lkml/2012/6/30/122

Simple fix to reset the leap sec= ond flag: date; date `date +"%m%d%H%M%C%y.%S"`; date;


= /Henrik

On Mon, Jul 2, 2012 at 1:56 PM, Jean Paul Ad= ant <jean.paul.adant@gmail.com> wrote:
Hi,

I did have the same p= roblem with cassandra 1.1.1 on Ubuntu 11.10
I had to reboot all n= odes
I'm interested in any information about this.

=
Thanks

Jean Paul

2= 012/7/2 Filippo Diotalevi <filippo@ntoklo.com>
Hi,
we had some really weird issues during the weeke= nd, with our cassandra nodes starting marking as dead other (working) nodes= in the cluster. That happened all Sunday, and it's still happening. No= de are marked dead and up all the time=85.

Some example logs:

INFO [= GossipTasks:1] 2012-07-02 06:55:01,804 Gossiper.java (line 818) InetAddress= /xx.xx.xx.233 is now dead.
INFO [GossipTasks:1] 2012-07-02 06:55= :01,805 Gossiper.java (line 818) InetAddress /xx.xx.xx.235 is now dead.
INFO [GossipStage:1] 2012-07-02 06:55:21,748 Gossiper.java (line 804) = InetAddress /xx.xx.xx.233 is now UP
INFO [GossipStage:1] 2012-07-= 02 06:55:21,893 Gossiper.java (line 804) InetAddress /xx.xx.xx.235 is now U= P
INFO [GossipTasks:1] 2012-07-02 06:56:03,877 Gossiper.java (line 818) = InetAddress /xx.xx.xx.235 is now dead.
INFO [GossipTasks:1] 2012-= 07-02 06:57:58,537 Gossiper.java (line 818) InetAddress /xx.xx.xx.233 is no= w dead.
INFO [GossipStage:1] 2012-07-02 06:59:06,444 Gossiper.java (line 804) = InetAddress /xx.xx.xx.233 is now UP


I couldn't find any real exception in the logs, but I noticed th= at the first error occurred at=A0
=A0INFO [GossipTasks:1] 2012-07-01 02:00:31,169 Gossiper.java (line 81= 8) InetAddress /xx.xx.xx.234 is now dead.

2012-07-= 01 02:00:31,169, in the German timezone were the machine is hosted, is June= 30th 23:59:60 UTC, the leap second that caused quite a few issues this wee= kend.=A0

Can it be the cause of the cluster failure? Has anybody= noticed similar issues? ( also see=A0https://twitter.com/red= ditstatus/status/219244389044731904 )

I'm running=A0Ubuntu 10.04.3 LTS.

Many thanks,
--=A0
Filippo Diotalevi




<= /div>--
----------------------------= -------------------------
Jean Paul Adant -=A0Cr=E9ative-Ing=E9nierie<= div> jean.paul.ad= ant@gmail.com




--001636d34591ee4ebd04c3d7d722--