Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D15C7ADA for ; Mon, 26 Sep 2011 08:03:07 +0000 (UTC) Received: (qmail 52436 invoked by uid 500); 26 Sep 2011 08:03:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52378 invoked by uid 500); 26 Sep 2011 08:03:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52370 invoked by uid 99); 26 Sep 2011 08:03:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2011 08:03:04 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HK_RANDOM_ENVFROM,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2011 08:02:58 +0000 Received: by gyd12 with SMTP id 12so4965944gyd.31 for ; Mon, 26 Sep 2011 01:02:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=MRzt6K6lm54tuSooxTbOSFEiJUXALqGg7DeaEm+VtAs=; b=j51ozb67ZW2Ay4sGGycovOaqeA77rsUHXi3h6s0jmwQN7rn/Ddf3WpwwDQkUmsvjvz T4fa5t5t+LWkMYge+Y1LKUyxKU5cd1apZnMPV71jXQqZU68Q8Z84qPnn6XCOAxGO6F96 dn3oX5UqoRCIzEwwVnuxCxQSj6dVleZ+8j8iM= MIME-Version: 1.0 Received: by 10.150.131.21 with SMTP id e21mr1180211ybd.53.1317024157917; Mon, 26 Sep 2011 01:02:37 -0700 (PDT) Received: by 10.151.14.20 with HTTP; Mon, 26 Sep 2011 01:02:37 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Sep 2011 01:02:37 -0700 Message-ID: Subject: Re: messages stopped for 3 minutes? From: Yang To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org hmmmm. never mind, possibly the first 24 seconds delay was caused by GC, the GC logging was not printed in system.log, I found one line on stdout that possibly corresponds to that. I found I left out the enable parallel remark param, let me add that and r= etry. Thanks Yang On Mon, Sep 26, 2011 at 12:25 AM, Yang wrote: > ok, I got more detailed logging that seems to give some clue out of > this myth .... > > I enabled TRACE level logging for Gossiper, also added a debug to > MessageDeliveryTask so that every arrival msg shows up in logging. > > I have 2 nodes, all the requests come from the coordinator node, > basically I see 2 possible issues. > > 1) > in the front of bad_coord.txt, we see that the coordinator was > receiving messages correctly until > > DEBUG [RequestResponseStage:22] 2011-09-26 05:43:35,521 > MessageDeliveryTask.java (line 60) delivering msg > > > then the line right after it is > DEBUG [RequestResponseStage:24] 2011-09-26 05:43:59,516 > MessageDeliveryTask.java (line 60) delivering msg > > so there is a 24 seconds delay , in this period, no messages were > delivered on the coordinator. this is enough to trigger a Timeout and > Unavailable. indeed we see time out =A0and UnavailableException later in > the log. > > > during this period, on the replica box, in bad_replica.txt we see > correct message delivery up to > DEBUG [ReadStage:107] 2011-09-26 05:43:35,214 MessageDeliveryTask.java > (line 60) delivering msg > > then on the next line, it starts doing something different: > TRACE [GossipTasks:1] 2011-09-26 05:43:35,570 Gossiper.java (line 125) > My heartbeat is now 5721 > TRACE [GossipTasks:1] 2011-09-26 05:43:35,570 Gossiper.java (line 357) > Gossip Digests are : /10.71.13.5:1317009996:5765 > ec2-50-19-172-99.compute-1.amazonaws.com/10.71.21.46:1317010027:5721 > > > I did not read the code in detail to understand what this means, but > it seems to correspond with the "silent period" on the coord in terms > of time. it keeps on doing this till > =A0INFO [GossipTasks:1] 2011-09-26 05:43:53,638 Gossiper.java (line 721) > InetAddress /10.71.13.5 is now dead. > > and > DEBUG [MutationStage:55] 2011-09-26 05:43:59,208 > MessageDeliveryTask.java (line 60) delivering msg > =A0INFO [GossipStage:1] 2011-09-26 05:43:59,211 Gossiper.java (line 707) > InetAddress /10.71.13.5 is now UP > > > =3D=3D=3D=3D=3D so, here the problem is that the messages stopped showing= up > from 05:43:35 to 05:43:59 on the coord, though the replica is trying > to send them . not sure if this is due to network. > > > 2) the other problem actually causes a longer delay on application side > in the end of bad_coord.txt we see that the TimeoutException and > AvailableException were printed out actually much later, after > 06:06:33 > TRACE [GossipStage:1] 2011-09-26 06:06:33,107 Gossiper.java (line 802) > /10.71.21.46local generation 1317010027, remote generation 1317010027 > TRACE [GossipStage:1] 2011-09-26 06:06:33,107 Gossiper.java (line 847) > Updating heartbeat state version to 7119 from 7118 for /10.71.21.46 > ... > ERROR [New I/O server worker #1-15] 2011-09-26 05:43:59,500 > MY_APPLICATION_CODE.java (line 234) : > Caused by: TimedOutException() > > ** the main issue here is that the timestamp of the logging line > itself says 05:43:59, which is correctly when the TimeOutException was > generated**, but the line was actually not printed out till 06:06:33, > 23minutes later. this causes a 23minutes pause visible to the clients. > > this 23minutes delay is much more damaging than the 24 seconds message > drop period in 1). I wonder what could have caused this 23minutes > delay. could be lock contention on grabbing the log4j appender ?? but > it shouldn't be, since I only added the extra logging AFTER I saw the > problem appear before. > > > Thanks for your patience for reading thus far :) > > > Yang > > > > > > On Sun, Sep 25, 2011 at 10:32 AM, Yang wrote: >> thanks Jonathan, >> >> >> I really don't know, I just did further tests to catch the jstack on >> the receiving side over the last night. going through these stacks >> now. =A0if I can't find anything suspicious, I'll add these debugging to >> the sending side too. >> >> another useful piece of info: when I did a single-node setup, I also >> found a lot of TimedOutException, similar to what I see with the >> 2-node setup. I think I didn't see the UnavailableException, probably >> because it's just a single node, and the node always believes itself >> to be available. >> >> I think GC issue is not the culprit here, since I don't see any length >> GC logging around when the delay is happening. no compaction/flushing >> either >> >> >> >> On Sun, Sep 25, 2011 at 6:33 AM, Jonathan Ellis wrot= e: >>> What makes you think the problem is on the receiving node, rather than >>> the sending node? >>> >>> On Sun, Sep 25, 2011 at 1:19 AM, Yang wrote: >>>> I constantly see TimedOutException , then followed by >>>> UnavailableException in my logs, >>>> so I added some extra debugging to Gossiper. notifyFailureDetector() >>>> >>>> >>>> >>>> =A0 =A0void notifyFailureDetector(InetAddress endpoint, EndpointState >>>> remoteEndpointState) >>>> =A0 =A0{ >>>> =A0 =A0 =A0 =A0IFailureDetector fd =3D FailureDetector.instance; >>>> =A0 =A0 =A0 =A0EndpointState localEndpointState =3D endpointStateMap.g= et(endpoint); >>>> =A0 =A0 =A0 =A0logger.debug("notify failure detector"); >>>> =A0 =A0 =A0 =A0/* >>>> =A0 =A0 =A0 =A0 * If the local endpoint state exists then report to th= e FD only >>>> =A0 =A0 =A0 =A0 * if the versions workout. >>>> =A0 =A0 =A0 =A0*/ >>>> =A0 =A0 =A0 =A0if ( localEndpointState !=3D null ) >>>> =A0 =A0 =A0 =A0{ >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0logger.debug("notify failure detector, = endpoint"); >>>> =A0 =A0 =A0 =A0 =A0 =A0int localGeneration =3D >>>> localEndpointState.getHeartBeatState().getGeneration(); >>>> =A0 =A0 =A0 =A0 =A0 =A0int remoteGeneration =3D >>>> remoteEndpointState.getHeartBeatState().getGeneration(); >>>> =A0 =A0 =A0 =A0 =A0 =A0if ( remoteGeneration > localGeneration ) >>>> =A0 =A0 =A0 =A0 =A0 =A0{ >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0localEndpointState.updateTimestamp(); >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0logger.debug("notify failure detector -= -- report 1"); >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fd.report(endpoint); >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return; >>>> =A0 =A0 =A0 =A0 =A0 =A0} >>>> >>>> >>>> >>>> >>>> then I found that this method stopped being called for a period of 3 >>>> minutes, so of course the detector considers the other side to be >>>> dead. >>>> >>>> but since these 2 boxes are in the same EC2 region, same security >>>> group, there is no reason there is a network issue that long. so I >>>> ran a background job that just does >>>> >>>> echo | nc $the_other_box 7000 =A0 in a loop >>>> >>>> and this always works fine, without failing to contact the 7000 port. >>>> >>>> >>>> so somehow the messages were not delivered or received, how could I de= bug this? >>>> (extra logging attached) >>>> >>>> Thanks >>>> Yang >>>> >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >