Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com
 designates 209.85.160.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAAnh3_8VUOv77ZqYyLnqBcTSg3SCOtXD4Vtc_Aeo2_nZTxmmbw@mail.gmail.com>
References: 
 <CAAnh3_8AeHidYH9ybt82_EMH3LikbCDseNRak3JHfzaJ2L+9zQ@mail.gmail.com>
	<CALdd-zhGmXMnnvMn_pOP6T-PT-2HgvfcnL8kR4gAT_=piVJ3QQ@mail.gmail.com>
	<CAAnh3_9ntkYHK1U5cYHOqz6iJ2f3A+VFphs8i3jRCSc-chTm0g@mail.gmail.com>
	<CAAnh3_8VUOv77ZqYyLnqBcTSg3SCOtXD4Vtc_Aeo2_nZTxmmbw@mail.gmail.com>
Date: Mon, 26 Sep 2011 01:02:37 -0700
Message-ID: 
 <CAAnh3_97H8YaeKQDv_Njgm9FGkbpmS0h9H6A5Bg+N4EmnegFXA@mail.gmail.com>
Subject: Re: messages stopped for 3 minutes?
From: Yang <teddyyyy123@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

hmmmm. never mind, possibly the first 24 seconds delay was caused by GC,
the GC logging was not printed in system.log,
I found one line on stdout that possibly corresponds to that.


I found I left out the enable parallel remark  param, let me add that and r=
etry.
Thanks

Yang

On Mon, Sep 26, 2011 at 12:25 AM, Yang <teddyyyy123@gmail.com> wrote:
> ok, I got more detailed logging that seems to give some clue out of
> this myth ....
>
> I enabled TRACE level logging for Gossiper, also added a debug to
> MessageDeliveryTask so that every arrival msg shows up in logging.
>
> I have 2 nodes, all the requests come from the coordinator node,
> basically I see 2 possible issues.
>
> 1)
> in the front of bad_coord.txt, we see that the coordinator was
> receiving messages correctly until
>
> DEBUG [RequestResponseStage:22] 2011-09-26 05:43:35,521
> MessageDeliveryTask.java (line 60) delivering msg
>
>
> then the line right after it is
> DEBUG [RequestResponseStage:24] 2011-09-26 05:43:59,516
> MessageDeliveryTask.java (line 60) delivering msg
>
> so there is a 24 seconds delay , in this period, no messages were
> delivered on the coordinator. this is enough to trigger a Timeout and
> Unavailable. indeed we see time out =A0and UnavailableException later in
> the log.
>
>
> during this period, on the replica box, in bad_replica.txt we see
> correct message delivery up to
> DEBUG [ReadStage:107] 2011-09-26 05:43:35,214 MessageDeliveryTask.java
> (line 60) delivering msg
>
> then on the next line, it starts doing something different:
> TRACE [GossipTasks:1] 2011-09-26 05:43:35,570 Gossiper.java (line 125)
> My heartbeat is now 5721
> TRACE [GossipTasks:1] 2011-09-26 05:43:35,570 Gossiper.java (line 357)
> Gossip Digests are : /10.71.13.5:1317009996:5765
> ec2-50-19-172-99.compute-1.amazonaws.com/10.71.21.46:1317010027:5721
>
>
> I did not read the code in detail to understand what this means, but
> it seems to correspond with the "silent period" on the coord in terms
> of time. it keeps on doing this till
> =A0INFO [GossipTasks:1] 2011-09-26 05:43:53,638 Gossiper.java (line 721)
> InetAddress /10.71.13.5 is now dead.
>
> and
> DEBUG [MutationStage:55] 2011-09-26 05:43:59,208
> MessageDeliveryTask.java (line 60) delivering msg
> =A0INFO [GossipStage:1] 2011-09-26 05:43:59,211 Gossiper.java (line 707)
> InetAddress /10.71.13.5 is now UP
>
>
> =3D=3D=3D=3D=3D so, here the problem is that the messages stopped showing=
 up
> from 05:43:35 to 05:43:59 on the coord, though the replica is trying
> to send them . not sure if this is due to network.
>
>
> 2) the other problem actually causes a longer delay on application side
> in the end of bad_coord.txt we see that the TimeoutException and
> AvailableException were printed out actually much later, after
> 06:06:33
> TRACE [GossipStage:1] 2011-09-26 06:06:33,107 Gossiper.java (line 802)
> /10.71.21.46local generation 1317010027, remote generation 1317010027
> TRACE [GossipStage:1] 2011-09-26 06:06:33,107 Gossiper.java (line 847)
> Updating heartbeat state version to 7119 from 7118 for /10.71.21.46
> ...
> ERROR [New I/O server worker #1-15] 2011-09-26 05:43:59,500
> MY_APPLICATION_CODE.java (line 234) :
> Caused by: TimedOutException()
>
> ** the main issue here is that the timestamp of the logging line
> itself says 05:43:59, which is correctly when the TimeOutException was
> generated**, but the line was actually not printed out till 06:06:33,
> 23minutes later. this causes a 23minutes pause visible to the clients.
>
> this 23minutes delay is much more damaging than the 24 seconds message
> drop period in 1). I wonder what could have caused this 23minutes
> delay. could be lock contention on grabbing the log4j appender ?? but
> it shouldn't be, since I only added the extra logging AFTER I saw the
> problem appear before.
>
>
> Thanks for your patience for reading thus far :)
>
>
> Yang
>
>
>
>
>
> On Sun, Sep 25, 2011 at 10:32 AM, Yang <teddyyyy123@gmail.com> wrote:
>> thanks Jonathan,
>>
>>
>> I really don't know, I just did further tests to catch the jstack on
>> the receiving side over the last night. going through these stacks
>> now. =A0if I can't find anything suspicious, I'll add these debugging to
>> the sending side too.
>>
>> another useful piece of info: when I did a single-node setup, I also
>> found a lot of TimedOutException, similar to what I see with the
>> 2-node setup. I think I didn't see the UnavailableException, probably
>> because it's just a single node, and the node always believes itself
>> to be available.
>>
>> I think GC issue is not the culprit here, since I don't see any length
>> GC logging around when the delay is happening. no compaction/flushing
>> either
>>
>>
>>
>> On Sun, Sep 25, 2011 at 6:33 AM, Jonathan Ellis <jbellis@gmail.com> wrot=
e:
>>> What makes you think the problem is on the receiving node, rather than
>>> the sending node?
>>>
>>> On Sun, Sep 25, 2011 at 1:19 AM, Yang <teddyyyy123@gmail.com> wrote:
>>>> I constantly see TimedOutException , then followed by
>>>> UnavailableException in my logs,
>>>> so I added some extra debugging to Gossiper. notifyFailureDetector()
>>>>
>>>>
>>>>
>>>> =A0 =A0void notifyFailureDetector(InetAddress endpoint, EndpointState
>>>> remoteEndpointState)
>>>> =A0 =A0{
>>>> =A0 =A0 =A0 =A0IFailureDetector fd =3D FailureDetector.instance;
>>>> =A0 =A0 =A0 =A0EndpointState localEndpointState =3D endpointStateMap.g=
et(endpoint);
>>>> =A0 =A0 =A0 =A0logger.debug("notify failure detector");
>>>> =A0 =A0 =A0 =A0/*
>>>> =A0 =A0 =A0 =A0 * If the local endpoint state exists then report to th=
e FD only
>>>> =A0 =A0 =A0 =A0 * if the versions workout.
>>>> =A0 =A0 =A0 =A0*/
>>>> =A0 =A0 =A0 =A0if ( localEndpointState !=3D null )
>>>> =A0 =A0 =A0 =A0{
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0logger.debug("notify failure detector, =
endpoint");
>>>> =A0 =A0 =A0 =A0 =A0 =A0int localGeneration =3D
>>>> localEndpointState.getHeartBeatState().getGeneration();
>>>> =A0 =A0 =A0 =A0 =A0 =A0int remoteGeneration =3D
>>>> remoteEndpointState.getHeartBeatState().getGeneration();
>>>> =A0 =A0 =A0 =A0 =A0 =A0if ( remoteGeneration > localGeneration )
>>>> =A0 =A0 =A0 =A0 =A0 =A0{
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0localEndpointState.updateTimestamp();
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0logger.debug("notify failure detector -=
-- report 1");
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fd.report(endpoint);
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
>>>> =A0 =A0 =A0 =A0 =A0 =A0}
>>>>
>>>>
>>>>
>>>>
>>>> then I found that this method stopped being called for a period of 3
>>>> minutes, so of course the detector considers the other side to be
>>>> dead.
>>>>
>>>> but since these 2 boxes are in the same EC2 region, same security
>>>> group, there is no reason there is a network issue that long. so I
>>>> ran a background job that just does
>>>>
>>>> echo | nc $the_other_box 7000 =A0 in a loop
>>>>
>>>> and this always works fine, without failing to contact the 7000 port.
>>>>
>>>>
>>>> so somehow the messages were not delivered or received, how could I de=
bug this?
>>>> (extra logging attached)
>>>>
>>>> Thanks
>>>> Yang
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>