Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of Arthur.Zubarev@aol.com
 designates 64.12.143.79 as permitted sender)
Message-ID: <B2AC2A31C74B412DBC7EF062967B0A35@CompudictedHP>
From: "Arthur Zubarev" <Arthur.Zubarev@Aol.com>
To: <user@cassandra.apache.org>
References: <78073DF1CF7E49EA8CF68C5C79F8FB30@keen.io>
In-Reply-To: <78073DF1CF7E49EA8CF68C5C79F8FB30@keen.io>
Subject: Re: Counter value becomes incorrect after several dozen reads &
 writes
Date: Mon, 24 Jun 2013 23:01:55 -0400
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_015B_01CE712E.D2536C80"
Importance: Normal
X-AOL-SCOLL-URL_COUNT: 1  
x-aol-sid: 3039ac1d290651c90829739c
X-AOL-IP: 99.238.22.30
X-Virus-Checked: Checked by ClamAV on apache.org

This is a multi-part message in MIME format.

------=_NextPart_000_015B_01CE712E.D2536C80
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Josh,

are you looking at the read counter produced by cfstats?

If so it is not for a CF, but the entire KS and not tied to a specific =
operation, but rather per the entire lifetime of JVM.

Just in case, some supporting info: =
http://stackoverflow.com/questions/9431590/cassandra-cfstats-and-meaning-=
of-read-write-latency

/Arthur

From: Josh Dzielak=20
Sent: Monday, June 24, 2013 9:42 PM
To: user@cassandra.apache.org=20
Subject: Counter value becomes incorrect after several dozen reads & =
writes

I have a loop that reads a counter, increments it by some integer, then =
goes off and does about 500ms of other work. After about 10 iterations =
of this loop, the counter value *sometimes* appears to be corrupted.

Looking at the logs, a sequence that just happened is:

Read counter - 15000
Increase counter by - 353
Read counter - 15353
Increase counter by - 1067
Read counter - 286079 (the new counter value is *very* different than =
what the increase should have produced, but usually, suspiciously, =
around 280k)
Increase counter by - 875
Read counter - 286079  (the counter stops changing at a certain point)

There is only 1 thread running this sequence, and consistency levels are =
set to ALL. The behavior is fairly repeatable - the unexpectation =
mutation will happen at least 10% of the time I run this program, but at =
different points. When it does not go awry, I can run this loop many =
thousands of times and keep the counter exact. But if it starts =
happening to a specific counter, the counter will never "recover" and =
will continue to maintain it's incorrect value even after successful =
subsequent writes.

I'm using the latest Astyanax driver on Cassandra 1.2.3 in a 3-node test =
cluster. It's also happened in development. Has anyone seem something =
like this? It feels almost too strange to be an actual bug but I'm =
stumped and have been looking at it too long :)

Thanks,
Josh

--
Josh Dzielak   =20
VP Engineering =E2=80=A2 Keen IO
Twitter =E2=80=A2 @dzello
Mobile =E2=80=A2 773-540-5264

------=_NextPart_000_015B_01CE712E.D2536C80
Content-Type: text/html;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY dir=3Dltr>
<DIV dir=3Dltr>
<DIV style=3D"FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV>Hi Josh,</DIV>
<DIV>&nbsp;</DIV>
<DIV>are you looking at the read counter produced by cfstats?</DIV>
<DIV>&nbsp;</DIV>
<DIV>If so it is not for a CF, but the entire KS and not tied to a =
specific=20
operation, but rather per the entire lifetime of JVM.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Just in case, some supporting info: <A=20
href=3D"http://stackoverflow.com/questions/9431590/cassandra-cfstats-and-=
meaning-of-read-write-latency"><FONT=20
face=3D"Times New =
Roman">http://stackoverflow.com/questions/9431590/cassandra-cfstats-and-m=
eaning-of-read-write-latency</FONT></A></DIV>
<DIV>&nbsp;</DIV>
<DIV>/Arthur</DIV>
<DIV=20
style=3D"FONT-SIZE: small; FONT-FAMILY: 'Calibri'; FONT-WEIGHT: normal; =
COLOR: #000000; FONT-STYLE: normal; TEXT-DECORATION: none; DISPLAY: =
inline">
<DIV style=3D"FONT: 10pt tahoma">
<DIV>&nbsp;</DIV>
<DIV style=3D"BACKGROUND: #f5f5f5">
<DIV style=3D"font-color: black"><B>From:</B> <A title=3Djosh@keen.io=20
href=3D"mailto:josh@keen.io">Josh Dzielak</A> </DIV>
<DIV><B>Sent:</B> Monday, June 24, 2013 9:42 PM</DIV>
<DIV><B>To:</B> <A title=3Duser@cassandra.apache.org=20
href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</A> =
</DIV>
<DIV><B>Subject:</B> Counter value becomes incorrect after several dozen =
reads=20
&amp; writes</DIV></DIV></DIV>
<DIV>&nbsp;</DIV></DIV>
<DIV=20
style=3D"FONT-SIZE: small; FONT-FAMILY: 'Calibri'; FONT-WEIGHT: normal; =
COLOR: #000000; FONT-STYLE: normal; TEXT-DECORATION: none; DISPLAY: =
inline">
<DIV>I have a loop that reads a counter, increments it by some integer, =
then=20
goes off and does about 500ms of other work. After about 10 iterations =
of this=20
loop, the counter value *sometimes* appears to be corrupted.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Looking at the logs, a sequence that just happened is:</DIV>
<DIV>&nbsp;</DIV>
<DIV>Read counter - 15000</DIV>
<DIV>Increase counter by - 353</DIV>
<DIV>Read counter - 15353</DIV>
<DIV>Increase counter by - 1067</DIV>
<DIV>Read counter - 286079 (the new counter value is *very* different =
than what=20
the increase should have produced, but usually, suspiciously, around =
280k)</DIV>
<DIV>
<DIV>Increase counter by - 875</DIV>
<DIV>Read counter - 286079&nbsp; (the counter stops changing at a =
certain=20
point)</DIV></DIV>
<DIV>&nbsp;</DIV>
<DIV>There is only 1 thread running this sequence, and consistency =
levels are=20
set to ALL. The behavior is fairly repeatable - the unexpectation =
mutation will=20
happen at least 10% of the time I run this program, but at different =
points.=20
When it does not go awry, I can run this loop many thousands of times =
and keep=20
the counter exact. But if it starts happening to a specific counter, the =
counter=20
will never "recover" and will continue to maintain it's incorrect value =
even=20
after successful subsequent writes.</DIV>
<DIV>&nbsp;</DIV>
<DIV>I'm using the latest Astyanax driver on Cassandra 1.2.3 in a 3-node =
test=20
cluster. It's also happened in development. Has anyone seem something =
like this?=20
It feels almost too strange to be an actual bug but I'm stumped and have =
been=20
looking at it too long :)</DIV>
<DIV>&nbsp;</DIV>
<DIV>Thanks,</DIV>
<DIV>Josh</DIV>
<DIV>
<DIV>&nbsp;</DIV>
<DIV style=3D"BACKGROUND-COLOR: rgb(255,255,255)">--</DIV>
<DIV style=3D"BACKGROUND-COLOR: rgb(255,255,255)">Josh =
Dzielak&nbsp;&nbsp;&nbsp;=20
</DIV>
<DIV style=3D"BACKGROUND-COLOR: rgb(255,255,255)">VP Engineering <SPAN=20
style=3D"FONT-FAMILY: arial, helvetica, =
sans-serif">=E2=80=A2</SPAN><SPAN=20
style=3D"FONT-FAMILY: arial, helvetica, sans-serif">&nbsp;</SPAN>Keen =
IO</DIV>
<DIV style=3D"BACKGROUND-COLOR: rgb(255,255,255)">Twitter <SPAN=20
style=3D"FONT-FAMILY: arial, helvetica, sans-serif">=E2=80=A2 <A=20
style=3D"COLOR: rgb(0,106,227)"=20
href=3D"https://twitter.com/dzello">@dzello</A></SPAN></DIV>
<DIV style=3D"BACKGROUND-COLOR: rgb(255,255,255)"><SPAN=20
style=3D"FONT-FAMILY: arial, helvetica, =
sans-serif">Mobile</SPAN>&nbsp;<SPAN=20
style=3D"FONT-FAMILY: arial, helvetica, sans-serif">=E2=80=A2 =
773-540-5264</SPAN></DIV>
<DIV>&nbsp;</DIV></DIV></DIV></DIV></DIV></BODY></HTML>

------=_NextPart_000_015B_01CE712E.D2536C80--