From user-return-26955-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Jun 13 12:07:15 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7FC849E2D for ; Wed, 13 Jun 2012 12:07:15 +0000 (UTC) Received: (qmail 95180 invoked by uid 500); 13 Jun 2012 12:07:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95151 invoked by uid 500); 13 Jun 2012 12:07:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95142 invoked by uid 99); 13 Jun 2012 12:07:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2012 12:07:12 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FSL_RCVD_USER,SPF_HELO_PASS,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of nicolas.lalevee@hibnet.org does not designate 216.86.168.183 as permitted sender) Received: from [216.86.168.183] (HELO mxout-08.mxes.net) (216.86.168.183) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2012 12:07:07 +0000 Received: from [10.0.0.238] (unknown [92.103.90.130]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id BCE4D509EB for ; Wed, 13 Jun 2012 08:06:44 -0400 (EDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Dead node still being pinged From: =?iso-8859-1?Q?Nicolas_Lalev=E9e?= In-Reply-To: Date: Wed, 13 Jun 2012 14:06:43 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <373F7247-8670-4DBC-8FF3-37B3F9141092@thelastpickle.com> <7B568B55-6A44-4E06-B205-2B374C13365E@hibnet.org> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org Le 13 juin 2012 =E0 10:30, aaron morton a =E9crit : > Here is what I *think* is going on, if Brandon is around he may be = able to help out.=20 >=20 >=20 > The old nodes are being included in the Gossip rounds, because = Gossiper.doGossipToUnreachableMember() just looks at the nodes that are = unreachable. It does not check if they have been removed from the = cluster.=20 >=20 > Information about the removed nodes is kept by gossip so that if a = node is removed while it is down it will shut down when restarted. This = information *should* stay in gossip for 3 days.=20 >=20 > In your gossip info, the last long on the STATUS lines is the expiry = time for this info=85 >=20 > /10.10.0.24 > STATUS:removed,127605887595351923798765477786913079296,1336530323263 > REMOVAL_COORDINATOR:REMOVER,0 > /10.10.0.22 > STATUS:removed,42535295865117307932921825928971026432,1336529659203 > REMOVAL_COORDINATOR:REMOVER,113427455640312814857969558651062452224 >=20 > For the first line it's=20 > In [48]: datetime.datetime.fromtimestamp(1336530323263/1000) > Out[48]: datetime.datetime(2012, 5, 9, 14, 25, 23) >=20 > So that's good.=20 >=20 > The Gossip round will remove the 0.24 and 0.22 nodes from the local = state if the expiry time has passed, and the node is marked as dead and = it's not in the token ring.=20 >=20 > You can see if the node thinks 0.24 and 0.22 are up by looking = getSimpleStates() on the FailureDetectorMBean. (I use jmxterm to do this = sort of thing) The two old nodes are still seen as down: SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, = /10.10.0.25:UP, /10.10.0.27:UP] >=20 > The other thing that can confuse things is the gossip generation. If = your old nodes were started with a datetime in the future that can muck = things up.=20 I have just checked, my old nodes machines are nicely synchronized. My = new nodes have some lag of few seconds, some in the future, some in the = past. I definitively need to fix that. > The simple to try is starting the server with the = -Dcassandra.join_ring=3Dfalse JVM option. This will force the node to = get the ring info from othernodes. Check things with nodetool gossip = info to see if the other nodes tell it about the old ones again. You meant -Dcassandra.load_ring_state=3Dfalse right ? Then nothing changed. > Sorry, gossip can be tricky to diagnose over email.=20 No worry, I really appreciate that you take time looking into my issues. Maybe I could open a jira about my issue ? Maybe there was a config mess = on my part at some point, ie the unsynchronized date on my machines, but = I think it would be nice if cassandra could resolve itself of that = inconsistent state. Nicolas >=20 >=20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 12/06/2012, at 10:33 PM, Nicolas Lalev=E9e wrote: >=20 >> I have one dirty solution to try: bring data-2 and data-4 back up and = down again. Is there any way I can tell cassandra to not get any data, = so when I would get my old node up, no streaming would start ? >>=20 >> cheers, >> Nicolas >>=20 >> Le 12 juin 2012 =E0 12:25, Nicolas Lalev=E9e a =E9crit : >>=20 >>> Le 12 juin 2012 =E0 11:03, aaron morton a =E9crit : >>>=20 >>>> Try purging the hints for 10.10.0.24 using the HintedHandOffManager = MBean. >>>=20 >>> As far as I could tell, there were no hinted hand off to be = delivered. Nevertheless I have called "deleteHintsForEndpoint" on every = node for the two expected to be out nodes. >>> Nothing changed, I still see packet being send to these old nodes. >>>=20 >>> I looked closer to ResponsePendingTasks of MessagingService. = Actually the numbers change, between 0 and about 4. So tasks are ending = but new ones come just after. >>>=20 >>> Nicolas >>>=20 >>>>=20 >>>> Cheers >>>>=20 >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>>=20 >>>> On 12/06/2012, at 3:33 AM, Nicolas Lalev=E9e wrote: >>>>=20 >>>>> finally, thanks to the groovy jmx builder, it was not that hard. >>>>>=20 >>>>>=20 >>>>> Le 11 juin 2012 =E0 12:12, Samuel CARRIERE a =E9crit : >>>>>=20 >>>>>> If I were you, I would connect (through JMX, with jconsole) to = one of the nodes that is sending messages to an old node, and would have = a look at these MBean :=20 >>>>>> - org.apache.net.FailureDetector : does SimpleStates looks good ? = (or do you see an IP of an old node) >>>>>=20 >>>>> SimpleStates:[/10.10.0.22:DOWN, /10.10.0.24:DOWN, /10.10.0.26:UP, = /10.10.0.25:UP, /10.10.0.27:UP] >>>>>=20 >>>>>> - org.apache.net.MessagingService : do you see one of the old IP = in one of the attributes ? >>>>>=20 >>>>> data-5: >>>>> CommandCompletedTasks: >>>>> [10.10.0.22:2, 10.10.0.26:6147307, 10.10.0.27:6084684, = 10.10.0.24:2] >>>>> CommandPendingTasks: >>>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0] >>>>> ResponseCompletedTasks: >>>>> [10.10.0.22:1487, 10.10.0.26:6187204, 10.10.0.27:6062890, = 10.10.0.24:1495] >>>>> ResponsePendingTasks: >>>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.27:0, 10.10.0.24:0] >>>>>=20 >>>>> data-6: >>>>> CommandCompletedTasks: >>>>> [10.10.0.22:2, 10.10.0.27:6064992, 10.10.0.24:2, = 10.10.0.25:6308102] >>>>> CommandPendingTasks: >>>>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:0, 10.10.0.25:0] >>>>> ResponseCompletedTasks: >>>>> [10.10.0.22:1463, 10.10.0.27:6067943, 10.10.0.24:1474, = 10.10.0.25:6367692] >>>>> ResponsePendingTasks: >>>>> [10.10.0.22:0, 10.10.0.27:0, 10.10.0.24:2, 10.10.0.25:0] >>>>>=20 >>>>> data-7: >>>>> CommandCompletedTasks: >>>>> [10.10.0.22:2, 10.10.0.26:6043653, 10.10.0.24:2, = 10.10.0.25:5964168] >>>>> CommandPendingTasks: >>>>> [10.10.0.22:0, 10.10.0.26:0, 10.10.0.24:0, 10.10.0.25:0] >>>>> ResponseCompletedTasks: >>>>> [10.10.0.22:1424, 10.10.0.26:6090251, 10.10.0.24:1431, = 10.10.0.25:6094954] >>>>> ResponsePendingTasks: >>>>> [10.10.0.22:4, 10.10.0.26:0, 10.10.0.24:1, 10.10.0.25:0] >>>>>=20 >>>>>> - org.apache.net.StreamingService : do you see an old IP in = StreamSources or StreamDestinations ? >>>>>=20 >>>>> nothing streaming on the 3 nodes. >>>>> nodetool netstats confirmed that. >>>>>=20 >>>>>> - org.apache.internal.HintedHandoff : are there non-zero = ActiveCount, CurrentlyBlockedTasks, PendingTasks, TotalBlockedTask ? >>>>>=20 >>>>> On the 3 nodes, all at 0. >>>>>=20 >>>>> I don't know much what I'm looking at, but it seems that some = ResponsePendingTasks needs to end. >>>>>=20 >>>>> Nicolas >>>>>=20 >>>>>>=20 >>>>>> Samuel=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> Nicolas Lalev=E9e >>>>>> 08/06/2012 21:03 >>>>>> Veuillez r=E9pondre =E0 >>>>>> user@cassandra.apache.org >>>>>>=20 >>>>>> A >>>>>> user@cassandra.apache.org >>>>>> cc >>>>>> Objet >>>>>> Re: Dead node still being pinged >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> Le 8 juin 2012 =E0 20:02, Samuel CARRIERE a =E9crit : >>>>>>=20 >>>>>>> I'm in the train but just a guess : maybe it's hinted handoff. A = look in the logs of the new nodes could confirm that : look for the IP = of an old node and maybe you'll find hinted handoff related messages. >>>>>>=20 >>>>>> I grepped on every node about every old node, I got nothing since = the "crash". >>>>>>=20 >>>>>> If it can be of some help, here is some grepped log of the crash: >>>>>>=20 >>>>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] = 2012-05-06 00:39:30,241 StorageService.java (line 2417) Endpoint = /10.10.0.24 is down and will not receive data for re-replication of = /10.10.0.22 >>>>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] = 2012-05-06 00:39:30,242 StorageService.java (line 2417) Endpoint = /10.10.0.24 is down and will not receive data for re-replication of = /10.10.0.22 >>>>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] = 2012-05-06 00:39:30,242 StorageService.java (line 2417) Endpoint = /10.10.0.24 is down and will not receive data for re-replication of = /10.10.0.22 >>>>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] = 2012-05-06 00:39:30,243 StorageService.java (line 2417) Endpoint = /10.10.0.24 is down and will not receive data for re-replication of = /10.10.0.22 >>>>>> system.log.1: WARN [RMI TCP Connection(1037)-10.10.0.26] = 2012-05-06 00:39:30,243 StorageService.java (line 2417) Endpoint = /10.10.0.24 is down and will not receive data for re-replication of = /10.10.0.22 >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-06 00:44:33,822 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,894 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>> system.log.1: INFO [OptionalTasks:1] 2012-05-06 04:25:23,895 = HintedHandOffManager.java (line 179) Deleting any stored hints for = /10.10.0.24 >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-06 04:25:23,895 = StorageService.java (line 1157) Removing token = 127605887595351923798765477786913079296 for /10.10.0.24 >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-09 04:26:25,015 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>>=20 >>>>>>=20 >>>>>> Maybe its the way I have removed nodes ? AFAIR I didn't used the = decommission command. For each node I got the node down and then issue a = remove token command. >>>>>> Here is what I can find in the log about when I removed one of = them: >>>>>>=20 >>>>>> system.log.1: INFO [GossipTasks:1] 2012-05-02 17:21:10,281 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:21:21,496 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-02 17:21:59,307 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:31:20,336 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:41:06,177 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 17:51:18,148 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:00:31,709 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:11:02,521 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:20:38,282 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:31:09,513 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:40:31,565 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 18:51:10,566 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:00:32,197 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:11:17,018 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [HintedHandoff:1] 2012-05-02 19:21:21,759 = HintedHandOffManager.java (line 292) Endpoint /10.10.0.24 died before = hint delivery, aborting >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 = Gossiper.java (line 818) InetAddress /10.10.0.24 is now dead. >>>>>> system.log.1: INFO [OptionalTasks:1] 2012-05-02 20:05:57,281 = HintedHandOffManager.java (line 179) Deleting any stored hints for = /10.10.0.24 >>>>>> system.log.1: INFO [GossipStage:1] 2012-05-02 20:05:57,281 = StorageService.java (line 1157) Removing token = 145835300108973619103103718265651724288 for /10.10.0.24 >>>>>>=20 >>>>>>=20 >>>>>> Nicolas >>>>>>=20 >>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> ----- Message d'origine ----- >>>>>>> De : Nicolas Lalev=E9e [nicolas.lalevee@hibnet.org] >>>>>>> Envoy=E9 : 08/06/2012 19:26 ZE2 >>>>>>> =C0 : user@cassandra.apache.org >>>>>>> Objet : Re: Dead node still being pinged >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> Le 8 juin 2012 =E0 15:17, Samuel CARRIERE a =E9crit : >>>>>>>=20 >>>>>>>> What does nodetool ring says ? (Ask every node) >>>>>>>=20 >>>>>>> currently, each of new node see only the tokens of the new = nodes. >>>>>>>=20 >>>>>>>> Have you checked that the list of seeds in every yaml is = correct ? >>>>>>>=20 >>>>>>> yes, it is correct, every of my new node point to the first of = my new node >>>>>>>=20 >>>>>>>> What version of cassandra are you using ? >>>>>>>=20 >>>>>>> Sorry I should have wrote this in my first mail. >>>>>>> I use the 1.0.9 >>>>>>>=20 >>>>>>> Nicolas >>>>>>>=20 >>>>>>>>=20 >>>>>>>> Samuel >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Nicolas Lalev=E9e >>>>>>>> 08/06/2012 14:10 >>>>>>>> Veuillez r=E9pondre =E0 >>>>>>>> user@cassandra.apache.org >>>>>>>>=20 >>>>>>>> A >>>>>>>> user@cassandra.apache.org >>>>>>>> cc >>>>>>>> Objet >>>>>>>> Dead node still being pinged >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> I had a configuration where I had 4 nodes, data-1,4. We then = bought 3 bigger machines, data-5,7. And we moved all data from data-1,4 = to data-5,7. >>>>>>>> To move all the data without interruption of service, I added = one new node at a time. And then I removed one by one the old machines = via a "remove token". >>>>>>>>=20 >>>>>>>> Everything was working fine. Until there was an expected load = on our cluster, the machine started to swap and become unresponsive. We = fixed the unexpected load and the three new machines were restarted. = After that the new cassandra machines were stating that some old token = were not assigned, namely from data-2 and data-4. To fix this I issued = again some "remove token" commands. >>>>>>>>=20 >>>>>>>> Everything seems to be back to normal, but on the network I = still see some packet from the new cluster to the old machines. On the = port 7000. >>>>>>>> How I can tell cassandra to completely forget about the old = machines ? >>>>>>>>=20 >>>>>>>> Nicolas >>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >>=20 >=20