Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 158E6ED33 for ; Tue, 25 Jun 2013 03:02:35 +0000 (UTC) Received: (qmail 3675 invoked by uid 500); 25 Jun 2013 03:02:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 3652 invoked by uid 500); 25 Jun 2013 03:02:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 3644 invoked by uid 99); 25 Jun 2013 03:02:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jun 2013 03:02:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Arthur.Zubarev@aol.com designates 64.12.143.79 as permitted sender) Received: from [64.12.143.79] (HELO omr-m05.mx.aol.com) (64.12.143.79) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jun 2013 03:02:23 +0000 Received: from mtaout-ma06.r1000.mx.aol.com (mtaout-ma06.r1000.mx.aol.com [172.29.41.6]) by omr-m05.mx.aol.com (Outbound Mail Relay) with ESMTP id C42F9700680AA for ; Mon, 24 Jun 2013 23:02:01 -0400 (EDT) Received: from CompudictedHP (CPE185933f3db5c-CM185933f3db59.cpe.net.cable.rogers.com [99.238.22.30]) by mtaout-ma06.r1000.mx.aol.com (MUA/Third Party Client Interface) with ESMTPA id 58FD2E0000A5 for ; Mon, 24 Jun 2013 23:02:01 -0400 (EDT) Message-ID: From: "Arthur Zubarev" To: References: <78073DF1CF7E49EA8CF68C5C79F8FB30@keen.io> In-Reply-To: <78073DF1CF7E49EA8CF68C5C79F8FB30@keen.io> Subject: Re: Counter value becomes incorrect after several dozen reads & writes Date: Mon, 24 Jun 2013 23:01:55 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_015B_01CE712E.D2536C80" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 15.4.3555.308 X-MimeOLE: Produced By Microsoft MimeOLE V15.4.3555.308 x-aol-global-disposition: G DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mx.aol.com; s=20121107; t=1372129321; bh=jl1MM6ywDl99BLKKWAXkRDZZRFsLy/EwdQFyjyDMQec=; h=From:To:Subject:Message-ID:Date:MIME-Version:Content-Type; b=oP78qKemD/FdmyTfhtkGeg1wkeUTiHt48zUHivL66dINaF1KK8JcQX8jhYE4R8AEk kppX8f9DXfyfF8LNZZoFSoUH0bX1fR9myD84gD8g0vutvFKSwpty1zeKHqg4SvVC1A 7qKZjlHGbI4B9LarkpinTn8bKqlrmEISWb1heQTc= X-AOL-SCOLL-SCORE: 1:2:492158784:93952408 X-AOL-SCOLL-URL_COUNT: 1 x-aol-sid: 3039ac1d290651c90829739c X-AOL-IP: 99.238.22.30 X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_015B_01CE712E.D2536C80 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Josh, are you looking at the read counter produced by cfstats? If so it is not for a CF, but the entire KS and not tied to a specific = operation, but rather per the entire lifetime of JVM. Just in case, some supporting info: = http://stackoverflow.com/questions/9431590/cassandra-cfstats-and-meaning-= of-read-write-latency /Arthur From: Josh Dzielak=20 Sent: Monday, June 24, 2013 9:42 PM To: user@cassandra.apache.org=20 Subject: Counter value becomes incorrect after several dozen reads & = writes I have a loop that reads a counter, increments it by some integer, then = goes off and does about 500ms of other work. After about 10 iterations = of this loop, the counter value *sometimes* appears to be corrupted. Looking at the logs, a sequence that just happened is: Read counter - 15000 Increase counter by - 353 Read counter - 15353 Increase counter by - 1067 Read counter - 286079 (the new counter value is *very* different than = what the increase should have produced, but usually, suspiciously, = around 280k) Increase counter by - 875 Read counter - 286079 (the counter stops changing at a certain point) There is only 1 thread running this sequence, and consistency levels are = set to ALL. The behavior is fairly repeatable - the unexpectation = mutation will happen at least 10% of the time I run this program, but at = different points. When it does not go awry, I can run this loop many = thousands of times and keep the counter exact. But if it starts = happening to a specific counter, the counter will never "recover" and = will continue to maintain it's incorrect value even after successful = subsequent writes. I'm using the latest Astyanax driver on Cassandra 1.2.3 in a 3-node test = cluster. It's also happened in development. Has anyone seem something = like this? It feels almost too strange to be an actual bug but I'm = stumped and have been looking at it too long :) Thanks, Josh -- Josh Dzielak =20 VP Engineering =E2=80=A2 Keen IO Twitter =E2=80=A2 @dzello Mobile =E2=80=A2 773-540-5264 ------=_NextPart_000_015B_01CE712E.D2536C80 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Josh,
 
are you looking at the read counter produced by cfstats?
 
If so it is not for a CF, but the entire KS and not tied to a = specific=20 operation, but rather per the entire lifetime of JVM.
 
Just in case, some supporting info: http://stackoverflow.com/questions/9431590/cassandra-cfstats-and-m= eaning-of-read-write-latency
 
/Arthur
 
Sent: Monday, June 24, 2013 9:42 PM
Subject: Counter value becomes incorrect after several dozen = reads=20 & writes
 
I have a loop that reads a counter, increments it by some integer, = then=20 goes off and does about 500ms of other work. After about 10 iterations = of this=20 loop, the counter value *sometimes* appears to be corrupted.
 
Looking at the logs, a sequence that just happened is:
 
Read counter - 15000
Increase counter by - 353
Read counter - 15353
Increase counter by - 1067
Read counter - 286079 (the new counter value is *very* different = than what=20 the increase should have produced, but usually, suspiciously, around = 280k)
Increase counter by - 875
Read counter - 286079  (the counter stops changing at a = certain=20 point)
 
There is only 1 thread running this sequence, and consistency = levels are=20 set to ALL. The behavior is fairly repeatable - the unexpectation = mutation will=20 happen at least 10% of the time I run this program, but at different = points.=20 When it does not go awry, I can run this loop many thousands of times = and keep=20 the counter exact. But if it starts happening to a specific counter, the = counter=20 will never "recover" and will continue to maintain it's incorrect value = even=20 after successful subsequent writes.
 
I'm using the latest Astyanax driver on Cassandra 1.2.3 in a 3-node = test=20 cluster. It's also happened in development. Has anyone seem something = like this?=20 It feels almost too strange to be an actual bug but I'm stumped and have = been=20 looking at it too long :)
 
Thanks,
Josh
 
--
Josh = Dzielak   =20
VP Engineering =E2=80=A2 Keen = IO
Twitter =E2=80=A2 @dzello
Mobile =E2=80=A2 = 773-540-5264
 
------=_NextPart_000_015B_01CE712E.D2536C80--