Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41561C1E5 for ; Tue, 12 Jun 2012 09:03:52 +0000 (UTC) Received: (qmail 48423 invoked by uid 500); 12 Jun 2012 09:03:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48003 invoked by uid 500); 12 Jun 2012 09:03:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47988 invoked by uid 99); 12 Jun 2012 09:03:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jun 2012 09:03:49 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a51.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jun 2012 09:03:40 +0000 Received: from homiemail-a51.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTP id EC87F2E8062 for ; Tue, 12 Jun 2012 02:03:16 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=mimA6tYo0N 6rhA6JeGlXtoqx9OmQd0qaP/doLZpxL/v8puq8xT7VkiOMCaEnVZzWPfLGEyPf59 CFOkUwS2u3ZVFBfgQcrY1xjpcYNtacPnzM1X+YGvEtIPn7b/CvCL5qJ8+VOt384w SM4uTA6oHmCkKX5DhNf3z6pDPQZAeM820= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=L6aFPn10FBbmcAZt J2bGE3rTy0k=; b=NeuXvZ5VzrAXRLPS15nRcnnAI+BpTXsuWWGXPDEFyP6CzN0E MPTBBRhnzdmMNxHy4myzXc1ogsD5pv1J/NfWD5D6nGKyc7qOR+oKGZHYWJVDQfik k+WZd0DypNsDTX5WUs6xvGnzSZbWPkPl3QOa5RDCTPiKDW2ncEBe3T0obeI= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTPSA id EADFB2E801C for ; Tue, 12 Jun 2012 02:03:15 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_BB317F0B-0BB6-4E98-BBCC-12F58BD29C42" Subject: Re: Dead node still being pinged Date: Tue, 12 Jun 2012 21:03:12 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <373F7247-8670-4DBC-8FF3-37B3F9141092@thelastpickle.com> X-Mailer: Apple Mail (2.1278) --Apple-Mail=_BB317F0B-0BB6-4E98-BBCC-12F58BD29C42 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Try purging the hints for 10.10.0.24 using the HintedHandOffManager = MBean. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 3:33 AM, Nicolas Lalev=E9e wrote: > finally, thanks to the groovy jmx builder, it was not that hard. >=20 >=20 > Le 11 juin 2012 =E0 12:12, Samuel CARRIERE a =E9crit : >=20 >> If I were you, I would connect (through JMX, with jconsole) to one of = the nodes that is sending messages to an old node, and would have a look = at these MBean :=20 >> - org.apache.net.FailureDetector : does SimpleStates looks good ? = (or do you see an IP of an old node) >=20 > SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, = /10.10.0.25:UP, /10.10.0.27:UP] >=20 >> - org.apache.net.MessagingService : do you see one of the old IP in = one of the attributes ? >=20 > data-5: > CommandCompletedTasks: > [10.10.0.22:2, 10.10.0.26:6147307, 10.10.0.27:6084684, 10.10.0.24:2] > CommandPendingTasks: > [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0] > ResponseCompletedTasks: > [10.10.0.22:1487, 10.10.0.26:6187204, 10.10.0.27:6062890, = 10.10.0.24:1495] > ResponsePendingTasks: > [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0] >=20 > data-6: > CommandCompletedTasks: > [10.10.0.22:2, 10.10.0.27:6064992, 10.10.0.24:2, 10.10.0.25:6308102] > CommandPendingTasks: > [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:0, 10.10.0.25:0] > ResponseCompletedTasks: > [10.10.0.22:1463, 10.10.0.27:6067943, 10.10.0.24:1474, = 10.10.0.25:6367692] > ResponsePendingTasks: > [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:2, 10.10.0.25:0] >=20 > data-7: > CommandCompletedTasks: > [10.10.0.22:2, 10.10.0.26:6043653, 10.10.0.24:2, 10.10.0.25:5964168] > CommandPendingTasks: > [10.10.0.22:0, 10.10.0.26:0, 10.10.0.24:0, 10.10.0.25:0] > ResponseCompletedTasks: > [10.10.0.22:1424, 10.10.0.26:6090251, 10.10.0.24:1431, = 10.10.0.25:6094954] > ResponsePendingTasks: > [10.10.0.22:4, 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0] >=20 >> - org.apache.net.StreamingService : do you see an old IP in = StreamSources or StreamDestinations ? >=20 > nothing streaming on the 3 nodes. > nodetool netstats confirmed that. >=20 >> - org.apache.internal.HintedHandoff : are there non-zero = ActiveCount, CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ? >=20 > On the 3 nodes, all at 0. >=20 > I don't know much what I'm looking at, but it seems that some = ResponsePendingTasks needs to end. >=20 > Nicolas >=20 >>=20 >> Samuel=20 >>=20 >>=20 >>=20 >> Nicolas Lalev=E9e >> 08/06/2012 21:03 >> Veuillez r=E9pondre =E0 >> user@cassandra.apache.org >>=20 >> A >> user@cassandra.apache.org >> cc >> Objet >> Re: Dead node still being pinged >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> Le 8 juin 2012 =E0 20:02, Samuel CARRIERE a =E9crit : >>=20 >>> I'm in the train but just a guess : maybe it's hinted handoff. A = look in the logs of the new nodes could confirm that : look for the IP = of an old node and maybe you'll find hinted handoff related messages. >>=20 >> I grepped on every node about every old node, I got nothing since the = "crash". >>=20 >> If it can be of some help, here is some grepped log of the crash: >>=20 >> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 = 00:39:30,241 StorageService.java (line 2417) Endpoint /10.10.0.24 is = down and will not receive data for re-replication of /10.10.0.22 >> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 = 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is = down and will not receive data for re-replication of /10.10.0.22 >> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 = 00:39:30,242 StorageService.java (line 2417) Endpoint /10.10.0.24 is = down and will not receive data for re-replication of /10.10.0.22 >> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 = 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is = down and will not receive data for re-replication of /10.10.0.22 >> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 = 00:39:30,243 StorageService.java (line 2417) Endpoint /10.10.0.24 is = down and will not receive data for re-replication of /10.10.0.22 >> system.log.1: INFO [GossipStage:1] 2012-05-06 00:44:33,822 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >> system.log.1: INFO [OptionalTasks:1] 2012-05-06 04:25:23,895 = HintedHandOffManager.java (line 179) Deleting any stored hints for = /10.10.0.24 >> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,895 = StorageService.java (line 1157) Removing token = 127605887595351923798765477786913079296 for /10.10.0.24 >> system.log.1: INFO [GossipStage:1] 2012-05-09 04:26:25,015 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>=20 >>=20 >> Maybe its the way I have removed nodes ? AFAIR I didn't used the = decommission command. For each node I got the node down and then issue a = remove token command. >> Here is what I can find in the log about when I removed one of them: >>=20 >> system.log.1: INFO [GossipTasks:1] 2012-05-02 17:21:10,281 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:21:21,496 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [GossipStage:1] 2012-05-02 17:21:59,307 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:31:20,336 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:41:06,177 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:51:18,148 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:00:31,709 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:11:02,521 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:20:38,282 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:31:09,513 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:40:31,565 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:51:10,566 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:00:32,197 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:11:17,018 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:21:21,759 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >> system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >> system.log.1: INFO [OptionalTasks:1] 2012-05-02 20:05:57,281 = HintedHandOffManager.java (line 179) Deleting any stored hints for = /10.10.0.24 >> system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 = StorageService.java (line 1157) Removing token = 145835300108973619103103718265651724288 for /10.10.0.24 >>=20 >>=20 >> Nicolas >>=20 >>=20 >>>=20 >>>=20 >>> ----- Message d'origine ----- >>> De : Nicolas Lalev=E9e [nicolas.lalevee@hibnet.org] >>> Envoy=E9 : 08/06/2012 19:26 ZE2 >>> =C0 : user@cassandra.apache.org >>> Objet : Re: Dead node still being pinged >>>=20 >>>=20 >>>=20 >>> Le 8 juin 2012 =E0 15:17, Samuel CARRIERE a =E9crit : >>>=20 >>>> What does nodetool ring says ? (Ask every node) >>>=20 >>> currently, each of new node see only the tokens of the new nodes. >>>=20 >>>> Have you checked that the list of seeds in every yaml is correct ? >>>=20 >>> yes, it is correct, every of my new node point to the first of my = new node >>>=20 >>>> What version of cassandra are you using ? >>>=20 >>> Sorry I should have wrote this in my first mail. >>> I use the 1.0.9 >>>=20 >>> Nicolas >>>=20 >>>>=20 >>>> Samuel >>>>=20 >>>>=20 >>>>=20 >>>> Nicolas Lalev=E9e >>>> 08/06/2012 14:10 >>>> Veuillez r=E9pondre =E0 >>>> user@cassandra.apache.org >>>>=20 >>>> A >>>> user@cassandra.apache.org >>>> cc >>>> Objet >>>> Dead node still being pinged >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> I had a configuration where I had 4 nodes, data-1,4. We then bought = 3 bigger machines, data-5,7. And we moved all data from data-1,4 to = data-5,7. >>>> To move all the data without interruption of service, I added one = new node at a time. And then I removed one by one the old machines via a = "remove token". >>>>=20 >>>> Everything was working fine. Until there was an expected load on = our cluster, the machine started to swap and become unresponsive. We = fixed the unexpected load and the three new machines were restarted. = After that the new cassandra machines were stating that some old token = were not assigned, namely from data-2 and data-4. To fix this I issued = again some "remove token" commands. >>>>=20 >>>> Everything seems to be back to normal, but on the network I still = see some packet from the new cluster to the old machines. On the port = 7000. >>>> How I can tell cassandra to completely forget about the old = machines ? >>>>=20 >>>> Nicolas >>>>=20 >>>>=20 >>>=20 >>=20 >>=20 >=20 --Apple-Mail=_BB317F0B-0BB6-4E98-BBCC-12F58BD29C42 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Try = purging the hints for 10.10.0.24 using the HintedHandOffManager = MBean.

Cheers

http://www.thelastpickle.com

On 12/06/2012, at 3:33 AM, Nicolas Lalev=E9e = wrote:

finally, thanks to the groovy jmx builder, it was not = that hard.


Le 11 juin 2012 =E0 12:12, Samuel CARRIERE a =E9crit= :

If I were you, I would connect = (through JMX, with jconsole) to one of the nodes that is sending = messages to an old node, and would have a look at these MBean : =
  - = org.apache.net.FailureDetector : does SimpleStates looks good ? (or do = you see an IP of an old = node)

SimpleStates:[/10.10.0.22:DOWN, = /10.10.0.24:DOWN, /10.10.0.26:UP, /10.10.0.25:UP, = /10.10.0.27:UP]

  - = org.apache.net.MessagingService : do you see one of the old IP in one of = the attributes = ?

data-5:
CommandCompletedTasks:
[10.10.0.22:2, = 10.10.0.26:6147307, 10.10.0.27:6084684, = 10.10.0.24:2]
CommandPendingTasks:
[10.10.0.22:0, 10.10.0.26:0, = 10.10.0.27:0, = 10.10.0.24:0]
ResponseCompletedTasks:
[10.10.0.22:1487, = 10.10.0.26:6187204, 10.10.0.27:6062890, = 10.10.0.24:1495]
ResponsePendingTasks:
[10.10.0.22:0, = 10.10.0.26:0, 10.10.0.27:0, = 10.10.0.24:0]

data-6:
CommandCompletedTasks:
[10.10.0.22:2, = 10.10.0.27:6064992, 10.10.0.24:2, = 10.10.0.25:6308102]
CommandPendingTasks:
[10.10.0.22:0, = 10.10.0.27:0, 10.10.0.24:0, = 10.10.0.25:0]
ResponseCompletedTasks:
[10.10.0.22:1463, = 10.10.0.27:6067943, 10.10.0.24:1474, = 10.10.0.25:6367692]
ResponsePendingTasks:
[10.10.0.22:0, = 10.10.0.27:0, 10.10.0.24:2, = 10.10.0.25:0]

data-7:
CommandCompletedTasks:
[10.10.0.22:2, = 10.10.0.26:6043653, 10.10.0.24:2, = 10.10.0.25:5964168]
CommandPendingTasks:
[10.10.0.22:0, = 10.10.0.26:0, 10.10.0.24:0, = 10.10.0.25:0]
ResponseCompletedTasks:
[10.10.0.22:1424, = 10.10.0.26:6090251, 10.10.0.24:1431, = 10.10.0.25:6094954]
ResponsePendingTasks:
[10.10.0.22:4, = 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0]

  - org.apache.net.StreamingService : do you = see an old IP in StreamSources or StreamDestinations = ?

nothing streaming on the 3 nodes.
nodetool = netstats confirmed that.

  - = org.apache.internal.HintedHandoff : are there non-zero ActiveCount, = CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask = ?

On the 3 nodes, all at 0.

I don't know much = what I'm looking at, but it seems that some ResponsePendingTasks needs = to end.

Nicolas


Samuel =



Nicolas Lalev=E9e= <nicolas.lalevee@hibnet.org&= gt;
08/06/2012 = 21:03
Veuillez r=E9pondre = =E0
user@cassandra.apache.org

A
user@cassandra.apache.org
cc
Objet
Re: Dead = node still being pinged






Le 8 juin 2012 = =E0 20:02, Samuel CARRIERE a =E9crit :

I'm in the train but just a guess : maybe it's hinted = handoff. A look in the logs of the new nodes could confirm that : look = for the IP of an old node and maybe you'll find hinted handoff related = messages.

I grepped on = every node about every old node, I got nothing since the = "crash".

If it can be of = some help, here is some grepped log of the = crash:

system.log.1: = WARN [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,241 = StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will = not receive data for re-replication of = /10.10.0.22
system.log.1: WARN = [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,242 = StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will = not receive data for re-replication of = /10.10.0.22
system.log.1: WARN = [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,242 = StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will = not receive data for re-replication of = /10.10.0.22
system.log.1: WARN = [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,243 = StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will = not receive data for re-replication of = /10.10.0.22
system.log.1: WARN = [RMI TCP Connection(1037)-10.10.0.26] 2012-05-06 00:39:30,243 = StorageService.java (line 2417) Endpoint /10.10.0.24 is down and will = not receive data for re-replication of = /10.10.0.22
system.log.1: INFO = [GossipStage:1] 2012-05-06 00:44:33,822 Gossiper.java (line 818) = InetAddress /10.10.0.24 is now dead.
system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now = dead.
system.log.1: INFO = [OptionalTasks:1] 2012-05-06 04:25:23,895 HintedHandOffManager.java = (line 179) Deleting any stored hints for = /10.10.0.24
system.log.1: INFO = [GossipStage:1] 2012-05-06 04:25:23,895 StorageService.java (line 1157) = Removing token 127605887595351923798765477786913079296 for = /10.10.0.24
system.log.1: INFO = [GossipStage:1] 2012-05-09 04:26:25,015 Gossiper.java (line 818) = InetAddress /10.10.0.24 is now dead.


Maybe its the = way I have removed nodes ? AFAIR I didn't used the decommission command. = For each node I got the node down and then issue a remove token = command.
Here is what I can = find in the log about when I removed one of = them:

system.log.1: = INFO [GossipTasks:1] 2012-05-02 17:21:10,281 Gossiper.java (line 818) = InetAddress /10.10.0.24 is now dead.
system.log.1: INFO [HintedHandoff:1] 2012-05-02 = 17:21:21,496 HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 = died before hint delivery, aborting
system.log.1: INFO [GossipStage:1] 2012-05-02 17:21:59,307 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now = dead.
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 17:31:20,336 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 17:41:06,177 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 17:51:18,148 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:00:31,709 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:11:02,521 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:20:38,282 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:31:09,513 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:40:31,565 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 18:51:10,566 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 19:00:32,197 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 19:11:17,018 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [HintedHandoff:1] 2012-05-02 19:21:21,759 HintedHandOffManager.java = (line 292) Endpoint /10.10.0.24 died before hint delivery, = aborting
system.log.1: INFO = [GossipStage:1] 2012-05-02 20:05:57,281 Gossiper.java (line 818) = InetAddress /10.10.0.24 is now dead.
system.log.1: INFO [OptionalTasks:1] 2012-05-02 = 20:05:57,281 HintedHandOffManager.java (line 179) Deleting any stored = hints for /10.10.0.24
system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 = StorageService.java (line 1157) Removing token = 145835300108973619103103718265651724288 for = /10.10.0.24


Nicolas




----- Message d'origine = -----
De : Nicolas Lalev=E9e [nicolas.lalevee@hibnet.org]=
Envoy=E9 : 08/06/2012 19:26 = ZE2
=C0 : user@cassandra.apache.org
Objet : Re: Dead node still being = pinged



Le 8 juin 2012 =E0 15:17, Samuel = CARRIERE a =E9crit :

What = does nodetool ring says ? (Ask every = node)

currently, each of new node see = only the tokens of the new = nodes.

Have = you checked that the list of seeds in every yaml is correct = ?

yes, it is correct, every of my = new node point to the first of my new = node

What = version of cassandra are you using = ?

Sorry I should have wrote this = in my first mail.
I use the = 1.0.9

Nicolas


Samuel



Nicolas = Lalev=E9e <nicolas.lalevee@hibnet.org&= gt;
08/06/2012 = 14:10
Veuillez= r=E9pondre =E0
user@cassandra.apache.org

A
user@cassandra.apache.org
cc
Objet
Dead = node still being = pinged





I had = a configuration where I had 4 nodes, data-1,4. We then bought 3 bigger = machines, data-5,7. And we moved all data from data-1,4 to = data-5,7.
To = move all the data without interruption of service, I added one new node = at a time. And then I removed one by one the old machines via a "remove = token".

Everything was working fine. Until there was an expected = load on our cluster, the machine started to swap and become = unresponsive. We fixed the unexpected load and the three new machines = were restarted. After that the new cassandra machines were stating that = some old token were not assigned, namely from data-2 and data-4. To fix = this I issued again some "remove token" = commands.

Everything seems to be back to normal, but on the network = I still see some packet from the new cluster to the old machines. On the = port 7000.
How I = can tell cassandra to completely forget about the old machines = ?

Nicolas







= --Apple-Mail=_BB317F0B-0BB6-4E98-BBCC-12F58BD29C42--