Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAH5CebyLL9f2yDGrfphsm0ht1nXhmwwDe-qBwqEMPPwW-8_i5g@mail.gmail.com>
References: 
 <CAH5CebxtEmcRcXux5fyQbeLNuKeuA94yM68BzK75u7B5OXWm+w@mail.gmail.com>
 <CAH5CebyLL9f2yDGrfphsm0ht1nXhmwwDe-qBwqEMPPwW-8_i5g@mail.gmail.com>
From: Sebastian Estevez <sebastian.estevez@datastax.com>
Date: Tue, 20 Oct 2015 12:34:43 -0400
Message-ID: 
 <CACCmCN-oNVs57aO79Do0D6E3VRRDfDxXmwWXiur3uguy1C_feg@mail.gmail.com>
Subject: Re: "invalid global counter shard detected" warning on 2.1.3 and
 2.1.10
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b45116802ff4805228bd8d3

--047d7b45116802ff4805228bd8d3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Branton,


>    - How much should we be freaking out?
>
> The impact of this is possible counter inaccuracy (over counting or under
counting). If you are expecting counters to be exactly accurate, you are
already in trouble because they are not. This is because of the fact that
they are not idempotent operations operating in a distributed system
(you've probably read Aleksey's
<http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-imple=
mentation-of-counters>
post by now).

>
>    - Why is this recurring?  If I understand what's happening, this is a
>    self-healing process.  So, why would it keep happening?  Are we possib=
ly
>    using counters incorrectly?
>
> Even after running sstableupgrade, your counter cells will not be upgrade=
d
until they have all been incremented. You may still seeing the warning
happening on pre 2.1 counter cells which have not been incremented yet.

>
>    - What does it even mean that there were multiple shards for the same
>    counter?  How does that situation even occur?
>
> We used to maintain "counter shards" at the sstable level in pre 2.1
counters. This means that on compaction or reads we would essentially add
the shards together when getting the value or merging the cells. This
caused a series of problems including the warning you are still seeing.
TL;DR, we now store the final value of the counter (not the
increment/shard) at the commitlog level and beyond in post 2.1 counters, so
this is no longer an issue. Again, read Aleksey's post
<http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-imple=
mentation-of-counters>
.

Many users started fresh tables after upgrading to 2.1, update only the new
tables, and added application logic to decide what table to read from.
Something like monthly tables works well if you're doing time series
counters, and would ensure that you stop seeing the warnings on the
new/active tables and get the benefits of 2.1 counters quickly.


All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebasti=C3=A1n Est=C3=A9vez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world=E2=80=99s most innovative enterpri=
ses.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis <branton.davis@spanning.com=
>
wrote:

> Howdy Cassandra folks.
>
> Crickets here and it's sort of unsettling that we're alone with this
> issue.  Is it appropriate to create a JIRA issue for this or is there may=
be
> another way to deal with it?
>
> Thanks!
>
> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <branton.davis@spanning.co=
m
> > wrote:
>
>> Hey all.
>>
>> We've been seeing this warning on one of our clusters:
>>
>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>> org.apache.cassandra.db.context.CounterContext invalid global counter sh=
ard
>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; w=
ill
>> pick highest to self-heal on compaction
>>
>>
>> From what I've read and heard in the IRC channel, this warning could be
>> related to not running upgradesstables after upgrading from 2.0.x to
>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>> November.  Looking back, the warnings start appearing around June, when =
no
>> maintenance had been performed on the cluster.  At that time, we had bee=
n
>> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
>> (the upgrade was when we noticed this warning for the first time).
>>
>> From a suggestion in IRC, I went ahead and ran upgradesstables on all th=
e
>> nodes.  Our weekly repair also ran this morning.  But the warnings still
>> show up throughout the day.
>>
>> So, we have many questions:
>>
>>    - How much should we be freaking out?
>>    - Why is this recurring?  If I understand what's happening, this is a
>>    self-healing process.  So, why would it keep happening?  Are we possi=
bly
>>    using counters incorrectly?
>>    - What does it even mean that there were multiple shards for the same
>>    counter?  How does that situation even occur?
>>
>> We're pretty lost here, so any help would be greatly appreciated.
>>
>> Thanks!
>>
>
>

--047d7b45116802ff4805228bd8d3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Branton,<div><br></div><div><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-=
color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><ul style=
=3D"font-size:12.8px"><li style=3D"margin-left:15px">How much should we be =
freaking out?</li></ul></blockquote><div><span style=3D"font-size:12.8px">T=
he impact of this is possible counter=C2=A0inaccuracy=C2=A0(over counting o=
r under counting). If you are expecting counters to be exactly accurate, yo=
u are already in trouble because they are not. This is because of the fact =
that they are not idempotent operations operating in a distributed system (=
you&#39;ve probably read <a href=3D"http://www.datastax.com/dev/blog/whats-=
new-in-cassandra-2-1-a-better-implementation-of-counters">Aleksey&#39;s</a>=
 post by now).</span></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204=
);border-left-style:solid;padding-left:1ex"><ul style=3D"font-size:12.8px">=
<li style=3D"margin-left:15px">Why is this recurring?=C2=A0 If I understand=
 what&#39;s happening, this is a self-healing process.=C2=A0 So, why would =
it keep happening?=C2=A0 Are we possibly using counters incorrectly?</li></=
ul></blockquote><div><span style=3D"font-size:12.8px">Even after running ss=
tableupgrade, your counter cells will not be upgraded until they have all b=
een incremented. You may still seeing the warning happening on pre 2.1 coun=
ter cells which have not been incremented yet.=C2=A0</span></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width=
:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-lef=
t:1ex"><ul style=3D"font-size:12.8px"><li style=3D"margin-left:15px">What d=
oes it even mean that there were multiple shards for the same counter?=C2=
=A0 How does that situation even occur?</li></ul></blockquote><div><span st=
yle=3D"font-size:12.8px">We used to maintain &quot;counter shards&quot; at =
the sstable level in pre 2.1 counters. This means that on compaction or rea=
ds we would essentially add the shards together when getting the value or m=
erging the cells. This caused a series of problems including the warning yo=
u are still seeing. TL;DR, we now store the final value of the counter (not=
 the increment/shard) at the commitlog level and beyond in post 2.1 counter=
s, so this is no longer an issue. Again, read <a href=3D"http://www.datasta=
x.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counte=
rs">Aleksey&#39;s post</a>.</span></div><div><span style=3D"font-size:12.8p=
x"><br></span></div><div><span style=3D"font-size:12.8px">Many users starte=
d fresh tables after upgrading to 2.1, update only the new tables, and adde=
d application logic to decide what table to read from. Something like month=
ly tables works well if you&#39;re doing time series counters, and would en=
sure that you stop seeing the warnings on the new/active tables and get the=
 benefits of 2.1 counters quickly.=C2=A0</span></div><div><span style=3D"fo=
nt-size:12.8px"><br></span></div><div><span style=3D"font-size:12.8px"><br>=
</span></div><div><span style=3D"font-size:12.8px">=C2=A0</span></div><span=
 style=3D"font-size:12.8px"></span></div></div><div class=3D"gmail_extra"><=
br clear=3D"all"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><div>=
<div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><span><p style=
=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt">All the best,</p><p =
dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt"><br=
></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bottom:=
0pt"><span style=3D"text-decoration:underline;font-size:12px;font-family:Ar=
ial;color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap"><a h=
ref=3D"http://www.datastax.com/" style=3D"text-decoration:none" target=3D"_=
blank"><img src=3D"https://lh3.googleusercontent.com/pVhGeSH7Pht91xjoj4-LSC=
msBLEJnOtq9c52j-z5RQD-I_vqkjnlxxkkHZZPQYi-2xgAropKv0UMqXxu24XYSkSXg-WQ82UB2=
nZ2DHu9yusG97HKdKzgJcRg55Lxinrnkw" width=3D"187px;" height=3D"39px;" style=
=3D"border:none" alt=3D"datastax_logo.png"></a></span></p><p dir=3D"ltr" st=
yle=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span style=3D"fo=
nt-size:15px;font-family:Calibri;color:rgb(0,0,0);vertical-align:baseline;w=
hite-space:pre-wrap;background-color:transparent">Sebasti=C3=A1n Est=C3=A9v=
ez</span></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin=
-bottom:0pt"><span style=3D"font-size:15px;font-family:Calibri;color:rgb(0,=
0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transpar=
ent">Solutions Architect |</span><span style=3D"font-size:15px;font-family:=
Calibri;color:rgb(0,0,0);font-weight:bold;vertical-align:baseline;white-spa=
ce:pre-wrap;background-color:transparent"> </span><span style=3D"font-size:=
15px;font-family:Calibri;color:rgb(0,0,0);vertical-align:baseline;white-spa=
ce:pre-wrap;background-color:transparent">954 905 8615 | <a href=3D"mailto:=
sebastian.estevez@datastax.com" target=3D"_blank">sebastian.estevez@datasta=
x.com</a></span></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt=
;margin-bottom:0pt"><a href=3D"https://www.linkedin.com/company/datastax" s=
tyle=3D"color:rgb(17,85,204);font-size:12.8000001907349px;line-height:11.77=
60000228882px;text-decoration:none" target=3D"_blank"><span style=3D"font-s=
ize:15px;font-family:Calibri;text-decoration:underline;vertical-align:basel=
ine;white-space:pre-wrap;background-color:transparent"><img src=3D"https://=
lh3.googleusercontent.com/mtwNeSEAXaqeWwFu3bQmYfrSh4u1-RklZGXi_qeKa_xk1aGiV=
TDY4D8dFBMmJDRTR8G5E3C1rQhSsvh5-qsgxDJn0EnyB7QA4ymlNcjE-aZ2Bs5j4Azw6SAzFeGh=
louE9w" width=3D"27px;" height=3D"27px;" alt=3D"linkedin.png" style=3D"bord=
er:none"></span></a><span style=3D"font-size:15px;font-family:Arial;color:r=
gb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:tra=
nsparent"> </span><a href=3D"https://www.facebook.com/datastax" style=3D"co=
lor:rgb(17,85,204);font-size:12.8000001907349px;line-height:11.776000022888=
2px;text-decoration:none" target=3D"_blank"><span style=3D"font-size:15px;f=
ont-family:Calibri;text-decoration:underline;vertical-align:baseline;white-=
space:pre-wrap;background-color:transparent"><img src=3D"https://lh4.google=
usercontent.com/y-b0_-GHQA7rCMcAmCgyhwfRCnHWNV996qbYdUM4zrr5rWr7drbMmqcaE6c=
Luv_QplGMp30Z5IMZ1N3ScZqwbNcL91BXnUiXgjeuO1XcEq9v45v65i0svNGk1srzYkmMlQ" wi=
dth=3D"27px;" height=3D"27px;" alt=3D"facebook.png" style=3D"border:none"><=
/span></a><span style=3D"font-size:15px;font-family:Calibri;color:rgb(102,1=
02,102);vertical-align:baseline;white-space:pre-wrap;background-color:trans=
parent"> </span><a href=3D"https://twitter.com/datastax" style=3D"color:rgb=
(17,85,204);font-size:12.8000001907349px;line-height:11.7760000228882px;tex=
t-decoration:none" target=3D"_blank"><span style=3D"font-size:15px;font-fam=
ily:Calibri;text-decoration:underline;vertical-align:baseline;white-space:p=
re-wrap;background-color:transparent"><img src=3D"https://lh4.googleusercon=
tent.com/ZdAbTYu8I6ebOtAT1Umh9JHlULX4st8OlFMMNZr_YoF4C_94k_vziIHnUs1I9csY57=
-RUoeQyBPjbkGg3RTwM9QBVSh_aojEjvg3iyxZRHxvyPyXs_wScfyz3x3R8BVlMQ" width=3D"=
27px;" height=3D"27px;" alt=3D"twitter.png" style=3D"border:none"></span></=
a><span style=3D"font-size:15px;font-family:Calibri;color:rgb(102,102,102);=
vertical-align:baseline;white-space:pre-wrap;background-color:transparent">=
 </span><a href=3D"https://plus.google.com/+Datastax/about" style=3D"color:=
rgb(17,85,204);font-size:12.8000001907349px;line-height:11.7760000228882px;=
text-decoration:none" target=3D"_blank"><span style=3D"font-size:15px;font-=
family:Calibri;text-decoration:underline;vertical-align:baseline;white-spac=
e:pre-wrap;background-color:transparent"><img src=3D"https://lh6.googleuser=
content.com/gcFd7WMLL8mrrumsfosMiEjhDw29KePMjKcs-2BKezcUcvnuNWeqgZiig9OMStR=
6yt3e1KqZrJ_KDHnsgq_cTpjfjniP_ZzgT1ISGs1Dr7S2hGgfDbw9f7npg_F3IvxCNw" width=
=3D"27px;" height=3D"27px;" alt=3D"g+.png" style=3D"border:none"></span></a=
><span style=3D"font-size:15px;font-family:Calibri;color:rgb(102,102,102);v=
ertical-align:baseline;white-space:pre-wrap;background-color:transparent"> =
</span><span style=3D"color:rgb(17,85,204);font-size:15px;line-height:11.77=
6px;text-decoration:underline;font-family:Calibri;vertical-align:baseline;w=
hite-space:pre-wrap;background-color:transparent"><a href=3D"http://feeds.f=
eedburner.com/datastax" style=3D"color:rgb(17,85,204);font-size:12.80000019=
07349px;line-height:11.7760000228882px;text-decoration:none" target=3D"_bla=
nk"><img src=3D"https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IH=
rhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-=
bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg" width=3D"27px;" height=3D"27px;" style=3D"b=
order:none"></a></span><a href=3D"http://goog_410786983" target=3D"_blank">=
<br></a></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-=
bottom:0pt"><br></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt=
;margin-bottom:0pt"><a href=3D"http://www.datastax.com/gartner-magic-quadra=
nt-odbms" target=3D"_blank"><img src=3D"http://learn.datastax.com/rs/059-YL=
Z-577/images/Gartner_728x90_Sig4.png" alt=3D""></a></p></span></div><div di=
r=3D"ltr"><br></div><div dir=3D"ltr">Da<span style=3D"font-size:12.80000019=
07349px"><span style=3D"font-size:12px;font-family:Arial;color:rgb(0,0,0);v=
ertical-align:baseline;white-space:pre-wrap">taStax is the </span></span><s=
pan style=3D"font-size:12.8000001907349px"><span style=3D"font-size:12px;fo=
nt-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wr=
ap">fastest, mo</span></span><span style=3D"font-size:12px;font-family:Aria=
l;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">st scalabl=
e distributed database technology, delivering Apache Cassandra to the world=
=E2=80=99s most innovative enterprises. Datastax is built to be agile, alwa=
ys-on, and predictably scalable to any size. With more than 500 customers i=
n 45 countries, </span><span style=3D"font-size:12px;font-family:Arial;vert=
ical-align:baseline;white-space:pre-wrap">DataStax is the database technolo=
gy and transactional backbone of choice for the worlds most innovative comp=
anies such as Netflix, Adobe, Intuit, and eBay.</span><span style=3D"font-s=
ize:12px;font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-s=
pace:pre-wrap"> </span></div></div></div></div></div></div></div></div></di=
v>
<br><div class=3D"gmail_quote">On Tue, Oct 20, 2015 at 12:21 PM, Branton Da=
vis <span dir=3D"ltr">&lt;<a href=3D"mailto:branton.davis@spanning.com" tar=
get=3D"_blank">branton.davis@spanning.com</a>&gt;</span> wrote:<br><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr">Howdy Cassandra folks.<div><br></di=
v><div>Crickets here and it&#39;s sort of unsettling that we&#39;re alone w=
ith this issue.=C2=A0 Is it appropriate to create a JIRA issue for this or =
is there maybe another way to deal with it?<div><br></div><div>Thanks!</div=
></div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">On Sun, Oct 18, 2015 at 1:55 PM, Branto=
n Davis <span dir=3D"ltr">&lt;<a href=3D"mailto:branton.davis@spanning.com"=
 target=3D"_blank">branton.davis@spanning.com</a>&gt;</span> wrote:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex"><div dir=3D"ltr">Hey all.<div><br></div><div>We&=
#39;ve been seeing this warning on one of our clusters:</div><div><br></div=
><blockquote style=3D"margin:0 0 0 40px;border:none;padding:0px">2015-10-18=
 14:28:52,898 WARN =C2=A0[ValidationExecutor:14] org.apache.cassandra.db.co=
ntext.CounterContext invalid global counter shard detected; (4aa69016-4cf8-=
4585-8f23-e59af050d174, 1, 67158) and (4aa69016-4cf8-4585-8f23-e59af050d174=
, 1, 21486) differ only in count; will pick highest to self-heal on compact=
ion</blockquote><div><br></div><div>From what I&#39;ve read and heard in th=
e IRC channel, this warning could be related to not running upgradesstables=
 after upgrading from 2.0.x to 2.1.x.=C2=A0 I don&#39;t think we ran that t=
hen, but we&#39;ve been at 2.1 since last November.=C2=A0 Looking back, the=
 warnings start appearing around June, when no maintenance had been perform=
ed on the cluster.=C2=A0 At that time, we had been on 2.1.3 for a couple of=
 months.=C2=A0 We&#39;ve been on 2.1.10 for the last week (the upgrade was =
when we noticed this warning for the first time).</div><div><br></div><div>=
>From a suggestion in IRC, I went ahead and ran upgradesstables on all the n=
odes.=C2=A0 Our weekly repair also ran this morning.=C2=A0 But the warnings=
 still show up throughout the day.</div><div><br></div><div>So, we have man=
y questions:</div><div><ul><li>How much should we be freaking out?</li><li>=
Why is this recurring?=C2=A0 If I understand what&#39;s happening, this is =
a self-healing process.=C2=A0 So, why would it keep happening?=C2=A0 Are we=
 possibly using counters incorrectly?</li><li>What does it even mean that t=
here were multiple shards for the same counter?=C2=A0 How does that situati=
on even occur?</li></ul>We&#39;re pretty lost here, so any help would be gr=
eatly appreciated.</div><div><br></div><div>Thanks!</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b45116802ff4805228bd8d3--