Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 389C3F547 for ; Fri, 26 Apr 2013 14:14:56 +0000 (UTC) Received: (qmail 30110 invoked by uid 500); 26 Apr 2013 14:14:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30036 invoked by uid 500); 26 Apr 2013 14:14:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30026 invoked by uid 99); 26 Apr 2013 14:14:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 14:14:52 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=FORGED_YAHOO_RCVD,FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.90.252] (HELO nm22-vm1.bullet.mail.ne1.yahoo.com) (98.138.90.252) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 14:14:45 +0000 Received: from [98.138.226.176] by nm22.bullet.mail.ne1.yahoo.com with NNFMP; 26 Apr 2013 14:14:24 -0000 Received: from [98.138.226.59] by tm11.bullet.mail.ne1.yahoo.com with NNFMP; 26 Apr 2013 14:14:24 -0000 Received: from [127.0.0.1] by smtp210.mail.ne1.yahoo.com with NNFMP; 26 Apr 2013 14:14:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1366985664; bh=YN3DrJLpzOwOn2H2CsN678/KRgTXLkPirPHTfSjC4m4=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:From:Mime-Version:Content-Type:Subject:Date:In-Reply-To:To:References:Message-Id:X-Mailer; b=4c3xQQD8wpvuI/unUjv7P0RJY1OXRrY/Gw29cSv5B/OMpaSiBh4iyodt24ika7Hk9XS4ARDujg7UmLNWUwoASrNOkfpYB1LTc1c35uxIWvhUrLarFydi89EwWhbO8vzAMRBgvix70mz3kdhuExWtk951hNrtgl4cPSBc566iTaA= X-Yahoo-Newman-Id: 383820.85553.bm@smtp210.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: jo9ai.MVM1kWEbwTe2b2F69yjvzy7_guaw6T2TvnPxzDkLN eQIDOwHDaAOiVMHIWiiuLpM75V2NdPRjiVMRzfk_oldV9uG7vPR3UdiGbp4I hUwChTYmFZ2QX9y98jiCCCmFT4KBr0P.jOgsEo62eXyi8DPqa0zOSqY7eZJG xCUEJ.CLwUDfNexTtEhpp9tssmgPgdLwsA_Ctc7meIyhl.djUsgt8PnNJplT vL6EDZxKdZkdmOmxfQOcrzUTuJYDnM1I25gnecAhC74HzLakcx17BhpKxToc HxQaPOIzJTuINb8ZGNO9xWuW5jlJEQBZhVgm0QjXkIrBQO2bYLRkTeX7z8QG 2uYhKMCMFX6Z3OtgXlhX6jTLGlmLPqDPZ52jaE5FkPDKHXuBSb.oYEgZ3fmD IP_RM025szfQs2HryCyFC26.RRAnzL5qLzeqw5fAS0JxTCaRe4qDl8xcTC5m YjRuYcfuO9g4z5w6Omd7vmPS8uUMEo3rA_tPPLAoCrxAhMTNRtHBEbbXlBb2 lSTdXy7odteeTQov4nlALyf9nEL3iTqXU5w-- X-Yahoo-SMTP: t0UN_U2swBCFgwLIRu70LU92TrvpdQ-- X-Rocket-Received: from [192.168.1.2] (mtheroux2@76.118.248.45 with plain) by smtp210.mail.ne1.yahoo.com with SMTP; 26 Apr 2013 07:14:24 -0700 PDT From: Michael Theroux Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_4B1362DA-6D64-4818-900D-1BB56E7ABA13" Subject: Re: Really odd issue (AWS related?) Date: Fri, 26 Apr 2013 10:14:23 -0400 In-Reply-To: To: user@cassandra.apache.org References: <39D73F09-23D5-45A5-8DEF-B45BBDFC4BE8@thelastpickle.com> Message-Id: <64F68A8D-C846-47C5-AFC4-DBE047680EFB@yahoo.com> X-Mailer: Apple Mail (2.1283) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_4B1362DA-6D64-4818-900D-1BB56E7ABA13 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Thanks. We weren't monitoring this value when the issue occurred, and this = particular issue has not appeared for a couple of days (knock on wood). = Will keep an eye out though, -Mike On Apr 26, 2013, at 5:32 AM, Jason Wee wrote: > top command? st : time stolen from this vm by the hypervisor >=20 > jason >=20 >=20 > On Fri, Apr 26, 2013 at 9:54 AM, Michael Theroux = wrote: > Sorry, Not sure what CPU steal is :) >=20 > I have AWS console with detailed monitoring enabled... things seem to = track close to the minute, so I can see the CPU load go to 0... then = jump at about the minute Cassandra reports the dropped messages, >=20 > -Mike >=20 > On Apr 25, 2013, at 9:50 PM, aaron morton wrote: >=20 >>> The messages appear right after the node "wakes up". >> Are you tracking CPU steal ?=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >>=20 >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 25/04/2013, at 4:15 AM, Robert Coli wrote: >>=20 >>> On Wed, Apr 24, 2013 at 5:03 AM, Michael Theroux = wrote: >>>> Another related question. Once we see messages being dropped on = one node, our cassandra client appears to see this, reporting errors. = We use LOCAL_QUORUM with a RF of 3 on all queries. Any idea why clients = would see an error? If only one node reports an error, shouldn't the = consistency level prevent the client from seeing an issue? >>>=20 >>> If the client is talking to a broken/degraded coordinator node, = RF/CL >>> are unable to protect it from RPCTimeout. If it is unable to >>> coordinate the request in a timely fashion, your clients will get >>> errors. >>>=20 >>> =3DRob >>=20 >=20 >=20 --Apple-Mail=_4B1362DA-6D64-4818-900D-1BB56E7ABA13 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=iso-8859-1 Thanks.

We weren't monitoring this value when the issue occurred, and this particular issue has not appeared for a couple of days (knock on wood).  Will keep an eye out though,

-Mike

On Apr 26, 2013, at 5:32 AM, Jason Wee wrote:

top command? st : time stolen from this vm by the hypervisor

jason


On Fri, Apr 26, 2013 at 9:54 AM, Michael Theroux <mtheroux2@yahoo.com> wrote:
Sorry, Not sure what CPU steal is :)

I have AWS console with detailed monitoring enabled... things seem to track close to the minute, so I can see the CPU load go to 0... then jump at about the minute Cassandra reports the dropped messages,

-Mike

On Apr 25, 2013, at 9:50 PM, aaron morton wrote:

The messages appear right after the node "wakes up".
Are you tracking CPU steal ? 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 25/04/2013, at 4:15 AM, Robert Coli <rcoli@eventbrite.com> wrote:

On Wed, Apr 24, 2013 at 5:03 AM, Michael Theroux <mtheroux2@yahoo.com> wrote:
Another related question.  Once we see messages being dropped on one node, our cassandra client appears to see this, reporting errors.  We use LOCAL_QUORUM with a RF of 3 on all queries.  Any idea why clients would see an error?  If only one node reports an error, shouldn't the consistency level prevent the client from seeing an issue?

If the client is talking to a broken/degraded coordinator node, RF/CL
are unable to protect it from RPCTimeout. If it is unable to
coordinate the request in a timely fashion, your clients will get
errors.

=Rob




--Apple-Mail=_4B1362DA-6D64-4818-900D-1BB56E7ABA13--