Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of arodrime@gmail.com designates
 209.85.212.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <71E6575E-2555-48BA-B0D8-8AEE5E055ECB@thelastpickle.com>
References: 
 <CA+VSrLrCgTip_QRe12rFdyfky17ZjBx4Dv2RyPPNZXW+0D7rKA@mail.gmail.com>
 <CA+VSrLrsK9y0tdkTKscYV9YCuoHEDceMNK7_gLx+k20OD=x_Ag@mail.gmail.com>
 <71E6575E-2555-48BA-B0D8-8AEE5E055ECB@thelastpickle.com>
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Fri, 13 Apr 2012 14:24:40 +0200
Message-ID: 
 <CA+VSrLoW8eWF2wgnkHucKfHqfBtCiS7wquPtps6vvW8+cAzaRg@mail.gmail.com>
Subject: Re: Trouble with wrong data
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0442827ab14c0a04bd8e8f85

--f46d0442827ab14c0a04bd8e8f85
Content-Type: text/plain; charset=ISO-8859-1

The commitlog_total_space_in_mb was not set, I set it to avoid having the
same problem in the future.

I am aware of the over-counting problem introduced by the counters. The
point is that I use them to make statistics per hours. I can understand
having some wrong counts in the column corresponding to the crash time, but
how to explain that all my counts since the start (months ago) have become
wrong after the crash ?

After the crash I tried to repair my entire keyspace from one of the 2
nodes and this made my server crash again, no idea why. Can this failed
repair be at the origin of the corrupted data ?

I'm still replaying all my counts of the past months and I'm afraid this
kind of bug could happen again...

I was using cassandra for months without any issue.

Alain

2012/4/11 aaron morton <aaron@thelastpickle.com>

> However after recovering from this issue (freeing some space and fixing
>> the value of  "commitlog_total_space_in_mb" in cassandra.yaml)
>>
> Did the commit log grow larger than commitlog_total_space_in_mb ?
>
> I realized that all statistics were all destroyed. I have bad values on
>> every single counter since I start using them (september) !
>>
> Counter operations are not idempotent. If you client retries a counter
> operation it may result in the increment been applied twice. Could this
> have been your issue ?
>
> Cheers
>
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote:
>
> By the way, I am using Cassandra 1.0.7, CL = ONE (R/W), RF = 2, 2 EC2
> c1.medium nodes cluster
>
> Alain
>
> 2012/4/10 Alain RODRIGUEZ <arodrime@gmail.com>
>
>> Hi, I'm experimenting a strange and very annoying phenomena.
>>
>> I had a problem with the commit log size which grew too much and full one
>> of the hard disks in all my nodes almost at the same time (2 nodes only,
>> RF=2, so the 2 nodes are behaving exactly in the same way)
>>
>> My data are mounted in an other partition that was not full. However
>> after recovering from this issue (freeing some space and fixing the value
>> of  "commitlog_total_space_in_mb" in cassandra.yaml) I realized that all
>> statistics were all destroyed. I have bad values on every single counter
>> since I start using them (september) !
>>
>> Does anyone experimented something similar or have any clue on this ?
>>
>> Do you need more information ?
>>
>> Alain
>>
>
>
>

--f46d0442827ab14c0a04bd8e8f85
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

The commitlog_total_space_in_mb was not set, I set it=A0to avoid having the=
 same problem in the future.<div><br></div><div>I am aware of the over-coun=
ting problem introduced by the counters. The point is that I use them to ma=
ke statistics per hours. I can understand having some wrong counts in the c=
olumn=A0corresponding=A0to the crash time, but how to explain that all my c=
ounts since the start (months ago) have become wrong after the crash ?</div=
>


<div><br></div><div>After the crash I tried to repair my entire keyspace fr=
om one of the 2 nodes and this made my server crash again, no idea why. Can=
 this failed repair be at the origin of the corrupted data ?</div><div>


<br></div><div>I&#39;m still replaying all my counts of the past months and=
 I&#39;m afraid this kind of bug could happen again...</div><div><br></div>=
<div>I was using cassandra for months without any issue.</div><div><br>

</div><div>Alain</div><div><br><div class=3D"gmail_quote">2012/4/11 aaron m=
orton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com" targ=
et=3D"_blank">aaron@thelastpickle.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word"><div><bl=
ockquote type=3D"cite"><div><div class=3D"gmail_quote"><blockquote class=3D=
"gmail_quote" style=3D"margin-top:0px;margin-right:0px;margin-bottom:0px;ma=
rgin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);bo=
rder-left-style:solid;padding-left:1ex">


<div>However after recovering from this issue (freeing some space and fixin=
g the value of =A0&quot;commitlog_total_space_in_mb&quot; in cassandra.yaml=
)</div></blockquote></div></div></blockquote></div>Did the commit log grow =
larger than commitlog_total_space_in_mb ?=A0<div>


<br></div><div><div><blockquote type=3D"cite"><div><div class=3D"gmail_quot=
e"><blockquote class=3D"gmail_quote" style=3D"margin-top:0px;margin-right:0=
px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-co=
lor:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div>I realized that all statistics were all destroyed. I have bad values o=
n every single counter since I start using them (september) !</div></blockq=
uote></div></div></blockquote></div>Counter operations are not idempotent. =
If you client retries a counter operation it may result in the increment be=
en applied twice. Could this have been your issue ?=A0</div>


<div><br></div><div>Cheers</div><div><br></div><div>=A0<br><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><span style=3D"text-indent=
:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weigh=
t:normal;line-height:normal;border-collapse:separate;text-transform:none;fo=
nt-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><=
div style=3D"word-wrap:break-word">


<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">


<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">


<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance Deve=
loper</div><div>@aaronmorton</div><div><a href=3D"http://www.thelastpickle.=
com" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></=
span></div>


</span></div></span></span>
</div><div><div>

<br><div><div>On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote:</div><br><b=
lockquote type=3D"cite">By the way, I am using Cassandra 1.0.7, CL =3D ONE =
(R/W), RF =3D 2, 2 EC2 c1.medium nodes cluster<div><br></div><div>Alain<br>=
<br>


<div class=3D"gmail_quote">2012/4/10 Alain RODRIGUEZ <span dir=3D"ltr">&lt;=
<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodrime@gmail.com<=
/a>&gt;</span><br>

<blockquote class=3D"gmail_quote" style=3D"margin-top:0px;margin-right:0px;=
margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color=
:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>Hi, I&#39;=
m experimenting a strange and very annoying phenomena.</div>


<div><br></div><div>I had a problem with the commit log size which grew too=
 much and full one of the hard disks in all my nodes almost at the same tim=
e (2 nodes only, RF=3D2, so the 2 nodes are behaving exactly in the same wa=
y)</div>


<div><br></div><div>My data are mounted in an other partition that was not =
full. However after recovering from this issue (freeing some space and fixi=
ng the value of =A0&quot;commitlog_total_space_in_mb&quot; in cassandra.yam=
l) I realized that all statistics were all destroyed. I have bad values on =
every single counter since I start using them (september) !</div>


<div><br></div><div>Does anyone experimented something similar or have any =
clue on this ?</div><div><br></div><div>Do you need more information ?</div=
><span><font color=3D"#888888"><div><br></div><div>Alain</div>


</font></span></blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--f46d0442827ab14c0a04bd8e8f85--