Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of goudarzi@gmail.com designates
 209.85.216.53 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <E4C797B8-BFF5-4535-8F65-26E6A0F6C2A3@thelastpickle.com>
References: 
 <CAN1VBD9v88g7Pd_ALv=wOFMPzmH4Y=ZKw7EwLz1Un7-_fKnAiw@mail.gmail.com>
	<B477F811-C2B5-41F8-986A-8736F13C2FA3@thelastpickle.com>
	<84B566FB5B7B244B81E6F1FEADA9087701D2A9E565@LDNPCMMGMB01.INTRANET.BARCAPINT.COM>
	<CAN1VBD9giZvS3RNhmWuFjnYXKXYbBc5DoWzAtAGt-Oo06OrbLw@mail.gmail.com>
	<21BB7BBD-80FB-477D-8B02-209DC65069C9@thelastpickle.com>
	<CD9832CD-700C-42FB-B980-1325AB1656DE@thelastpickle.com>
	<CAN1VBD-sgEKRUEqJDuTgJ0S9X44J0SvStniUbxfRK=opT-f-ug@mail.gmail.com>
	<E4C797B8-BFF5-4535-8F65-26E6A0F6C2A3@thelastpickle.com>
Date: Mon, 1 Apr 2013 19:06:57 -0700
Message-ID: 
 <CAN1VBD8o_HRRLN3gtebkpQvuvKhiWMQrXKH1Qv9wR6WxHu9TRQ@mail.gmail.com>
Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
From: Arya Goudarzi <goudarzi@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11c2ba80373f9804d9573186

--001a11c2ba80373f9804d9573186
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Filed
https://issues.apache.org/jira/browse/CASSANDRA-5411
https://issues.apache.org/jira/browse/CASSANDRA-5412

However, I don't think that was our issue as we don't have nodes down for a
long period of time. The longest we had a node down was for a day, and it
was replaced within few hours. I have tried very hard to reproduce this
issue by putting our production snapshot on our staging cluster and run the
upgrade from 1.1.6 to 1.1.10, but I was not successful. So, I proposed to
upgrade our cluster in production and it happened again. Now our production
includes lots of these zombies that we have to delete. Luckily our
engineers already written scripts to facilitate those issues, but why why
why C*? After 3 years of developing apps with C* and maintaining it, I have
never been this much disappointed. Anyways, I cut my nagging, but this time
before i have engineers clean up the data, I checked the timestamps of a
handful of returned rows. The timestamps where from before they were
deleted, so they are officially deleted rows that came back to life. We
have repairs running every night, unless repair is incorrectly reporting
successful repairs, I really have no clue where to start and find answers
for this strange issue except medicating myself with scotch whiskey so that
I can sleep at night, not having to think what C* is going to bring on my
desk the next morning.

-Arya


On Sun, Mar 31, 2013 at 3:09 AM, aaron morton <aaron@thelastpickle.com>wrot=
e:

> But what if the gc_grace was changed to a lower value as part of a schema
> migration after the hints have been marked with TTLs equal to the lower
> gc_grace before the migration?
>
> There would be a chance then if the tombstones had been purged.
> Want to raise a ticket ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/03/2013, at 2:58 AM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>
> I am not familiar with that part of the code yet. But what if the gc_grac=
e
> was changed to a lower value as part of a schema migration after the hint=
s
> have been marked with TTLs equal to the lower gc_grace before the
> migration?
>
> From what you've described, I think this is not an issue for us as we did
> not have a node down for a long period of time, but just pointing out wha=
t
> I think could happen based on what you've described.
>
> On Sun, Mar 24, 2013 at 10:03 AM, aaron morton <aaron@thelastpickle.com>w=
rote:
>
>> I could imagine a  scenario where a hint was replayed to a replica after
>> all replicas had purged their tombstones
>>
>> Scratch that, the hints are TTL'd with the lowest gc_grace.
>> Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 24/03/2013, at 6:24 AM, aaron morton <aaron@thelastpickle.com> wrote:
>>
>> Beside the joke, would hinted handoff really have any role in this issue=
?
>>
>> I could imagine a  scenario where a hint was replayed to a replica after
>> all replicas had purged their tombstones. That seems like a long shot, i=
t
>> would need one node to be down for the write and all up for the delete a=
nd
>> for all of them to have purged the tombstone. But maybe we should have a
>> max age on hints so it cannot happen.
>>
>> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
>>
>> Ensuring no hints are in place during an upgrade would work around. I
>> tend to make sure hints and commit log are clear during an upgrade.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/03/2013, at 7:54 AM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>>
>> Beside the joke, would hinted handoff really have any role in this issue=
?
>> I have been struggling to reproduce this issue using the snapshot data
>> taken from our cluster and following the same upgrade process from 1.1.6=
 to
>> 1.1.10. I know snapshots only link to active SSTables. What if these
>> returned rows belong to some inactive SSTables and some bug exposed itse=
lf
>> and marked them as active? What are the possibilities that could lead to
>> this? I am eager to find our as this is blocking our upgrade.
>>
>> On Tue, Mar 19, 2013 at 2:11 AM, <moshe.kranc@barclays.com> wrote:
>>
>>> This obscure feature of Cassandra is called =E2=80=9Chaunted handoff=E2=
=80=9D.****
>>>
>>> ** **
>>>
>>> Happy (early) April Fools J****
>>>
>>> ** **
>>>
>>> *From:* aaron morton [mailto:aaron@thelastpickle.com]
>>> *Sent:* Monday, March 18, 2013 7:45 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to
>>> 1.1.10****
>>>
>>> ** **
>>>
>>> As you see, this node thinks lots of ranges are out of sync which
>>> shouldn't be the case as successful repairs where done every night prio=
r to
>>> the upgrade. ****
>>>
>>> Could this be explained by writes occurring during the upgrade process =
?
>>> ****
>>>
>>> ** **
>>>
>>> I found this bug which touches timestamp and tomstones which was fixed
>>> in 1.1.10 but am not 100% sure if it could be related to this issue:
>>> https://issues.apache.org/jira/browse/CASSANDRA-5153****
>>>
>>> Me neither, but the issue was fixed in 1.1.0****
>>>
>>> ** **
>>>
>>>  It appears that the repair task that I executed after upgrade, brought
>>> back lots of deleted rows into life.****
>>>
>>> Was it entire rows or columns in a row?****
>>>
>>> Do you know if row level or column level deletes were used ? ****
>>>
>>> ** **
>>>
>>> Can you look at the data in cassanca-cli and confirm the timestamps on
>>> the columns make sense ?  ****
>>>
>>> ** **
>>>
>>> Cheers****
>>>
>>> ** **
>>>
>>> -----------------****
>>>
>>> Aaron Morton****
>>>
>>> Freelance Cassandra Consultant****
>>>
>>> New Zealand****
>>>
>>> ** **
>>>
>>> @aaronmorton****
>>>
>>> http://www.thelastpickle.com****
>>>
>>> ** **
>>>
>>> On 16/03/2013, at 2:31 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:***=
*
>>>
>>>
>>>
>>> ****
>>>
>>> Hi,****
>>>
>>> ** **
>>>
>>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by
>>> running repairs. It appears that the repair task that I executed after
>>> upgrade, brought back lots of deleted rows into life. Here are some
>>> logistics:****
>>>
>>> ** **
>>>
>>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 **=
*
>>> *
>>>
>>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;****
>>>
>>> - Upgrade to : 1.1.10 with all other settings the same;****
>>>
>>> - Successful repairs were being done on this cluster every night;****
>>>
>>> - Our clients use nanosecond precision timestamp for cassandra calls;**=
*
>>> *
>>>
>>> - After upgrade, while running repair I say some log messages like this
>>> in one node:****
>>>
>>> ** **
>>>
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847
>>> AntiEntropyService.java (line 1022) [repair
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
>>> 23.20.207.56 have 2223 range(s) out of sync for App****
>>>
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877
>>> AntiEntropyService.java (line 1022) [repair
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /
>>> 23.20.207.56 have 161 range(s) out of sync for App****
>>>
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097
>>> AntiEntropyService.java (line 1022) [repair
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
>>> 23.20.250.43 have 2294 range(s) out of sync for App****
>>>
>>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190
>>> AntiEntropyService.java (line 789) [repair
>>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remainin=
g
>>> column family to sync for this session)****
>>>
>>> ** **
>>>
>>> As you see, this node thinks lots of ranges are out of sync which
>>> shouldn't be the case as successful repairs where done every night prio=
r to
>>> the upgrade. ****
>>>
>>> ** **
>>>
>>> The App CF uses SizeTiered with gc_grace of 10 days. It has caching =3D
>>> 'ALL', and it is fairly small (11Mb on each node).****
>>>
>>> ** **
>>>
>>> I found this bug which touches timestamp and tomstones which was fixed
>>> in 1.1.10 but am not 100% sure if it could be related to this issue:
>>> https://issues.apache.org/jira/browse/CASSANDRA-5153****
>>>
>>> ** **
>>>
>>> Any advice on how to dig deeper into this would be appreciated.****
>>>
>>> ** **
>>>
>>> Thanks,****
>>>
>>> -Arya****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> _______________________________________________
>>>
>>> This message may contain information that is confidential or privileged=
.
>>> If you are not an intended recipient of this message, please delete it =
and
>>> any attachments, and notify the sender that you have received it in err=
or.
>>> Unless specifically stated in the message or otherwise indicated, you m=
ay
>>> not duplicate, redistribute or forward this message or any portion ther=
eof,
>>> including any attachments, by any means to any other person, including =
any
>>> retail investor or customer. This message is not a recommendation, advi=
ce,
>>> offer or solicitation, to buy/sell any product or service, and is not a=
n
>>> official confirmation of any transaction. Any opinions presented are so=
lely
>>> those of the author and do not necessarily represent those of Barclays.
>>> This message is subject to terms available at:
>>> www.barclays.com/emaildisclaimer and, if received from Barclays' Sales
>>> or Trading desk, the terms available at:
>>> www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays
>>> you consent to the foregoing. Barclays Bank PLC is a company registered=
 in
>>> England (number 1026167) with its registered office at 1 Churchill Plac=
e,
>>> London, E14 5HP. This email may relate to or be sent from other members=
 of
>>> the Barclays group.
>>>
>>> _______________________________________________
>>>
>>
>>
>>
>>
>
>

--001a11c2ba80373f9804d9573186
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Filed=C2=A0<div><a href=3D"https://issues.apache.org/jira/=
browse/CASSANDRA-5411">https://issues.apache.org/jira/browse/CASSANDRA-5411=
</a>=C2=A0</div><div><a href=3D"https://issues.apache.org/jira/browse/CASSA=
NDRA-5412">https://issues.apache.org/jira/browse/CASSANDRA-5412</a><br>
<div><br></div><div style>However, I don&#39;t think that was our issue as =
we don&#39;t have nodes down for a long period of time. The longest we had =
a node down was for a day, and it was replaced within few hours. I have tri=
ed very hard to reproduce this issue by putting our production snapshot on =
our staging cluster and run the upgrade from 1.1.6 to 1.1.10, but I was not=
 successful. So, I proposed to upgrade our cluster in production and it hap=
pened again. Now our production includes lots of these zombies that we have=
 to delete. Luckily our engineers already written scripts to facilitate tho=
se issues, but why why why C*? After 3 years of developing apps with C* and=
 maintaining it, I have never been this much disappointed. Anyways, I cut m=
y nagging, but this time before i have engineers clean up the data, I check=
ed the timestamps of a handful of returned rows. The timestamps where from =
before they were deleted, so they are officially deleted rows that came bac=
k to life. We have repairs running every night, unless repair is=C2=A0incor=
rectly=C2=A0reporting successful repairs, I really have no clue where to st=
art and find answers for this strange issue except medicating myself with s=
cotch=C2=A0whiskey so that I can sleep at night, not having to think what C=
* is going to bring on my desk the next morning.=C2=A0=C2=A0</div>
<div style><br></div><div style>-Arya</div></div></div><div class=3D"gmail_=
extra"><br><br><div class=3D"gmail_quote">On Sun, Mar 31, 2013 at 3:09 AM, =
aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.co=
m" target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word"><div cla=
ss=3D"im"><blockquote type=3D"cite">But what if the gc_grace was changed to=
 a lower value as part of a schema migration after the hints have been mark=
ed with TTLs equal to the lower gc_grace before the migration?=C2=A0</block=
quote>
</div>There would be a chance then if the tombstones had been purged.=C2=A0=
<div>Want to raise a ticket ?=C2=A0</div><div><br></div><div>Cheers</div><d=
iv><br></div><div><div class=3D"im"><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://ww=
w.thelastpickle.com</a></div>
</div></span></div></span></div></span></div></span></div></div>
</div>

<br></div><div><div class=3D"h5"><div><div>On 29/03/2013, at 2:58 AM, Arya =
Goudarzi &lt;<a href=3D"mailto:goudarzi@gmail.com" target=3D"_blank">goudar=
zi@gmail.com</a>&gt; wrote:</div><br><blockquote type=3D"cite">I am not fam=
iliar with that part of the code yet. But what if the gc_grace was changed =
to a lower value as part of a schema migration after the hints have been ma=
rked with TTLs equal to the lower gc_grace before the migration?=C2=A0<div>

<br></div><div>From what you&#39;ve described, I think this is not an issue=
 for us as we did not have a node down for a long period of time, but just =
pointing out what I think could happen based on what you&#39;ve described.<=
br>

<br><div class=3D"gmail_quote">On Sun, Mar 24, 2013 at 10:03 AM, aaron mort=
on <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com" target=
=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex">

<div style=3D"word-wrap:break-word"><div><blockquote type=3D"cite"><div sty=
le=3D"word-wrap:break-word">I could imagine a =C2=A0scenario where a hint w=
as replayed to a replica after all replicas had purged their tombstones</di=
v>
</blockquote></div>Scratch that, the hints are TTL&#39;d with the lowest gc=
_grace.=C2=A0<div>Ticket closed=C2=A0<a href=3D"https://issues.apache.org/j=
ira/browse/CASSANDRA-5379" target=3D"_blank">https://issues.apache.org/jira=
/browse/CASSANDRA-5379</a></div>

<div><br></div><div>Cheers</div><div><br></div><div><div><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">

<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com/" target=3D"_blank">http://w=
ww.thelastpickle.com</a></div>

</div></span></div></span></div></span></div></span></div></div>
</div>

<br></div><div><div><div><div>On 24/03/2013, at 6:24 AM, aaron morton &lt;<=
a href=3D"mailto:aaron@thelastpickle.com" target=3D"_blank">aaron@thelastpi=
ckle.com</a>&gt; wrote:</div><br><blockquote type=3D"cite"><div style=3D"wo=
rd-wrap:break-word">

<blockquote type=3D"cite">Beside the joke, would hinted handoff really have=
 any role in this issue?</blockquote>I could imagine a =C2=A0scenario where=
 a hint was replayed to a replica after all replicas had purged their tombs=
tones. That seems like a long shot, it would need one node to be down for t=
he write and all up for the delete and for all of them to have purged the t=
ombstone. But maybe we should have a max age on hints so it cannot happen.=
=C2=A0<div>

<br></div><div>Created=C2=A0<a href=3D"https://issues.apache.org/jira/brows=
e/CASSANDRA-5379" target=3D"_blank">https://issues.apache.org/jira/browse/C=
ASSANDRA-5379</a></div><div><br></div><div>Ensuring no hints are in place d=
uring an upgrade would work around. I tend to make sure hints and commit lo=
g are clear during an upgrade.=C2=A0</div>

<div><br></div><div>Cheers</div><div><br><div>
<div style=3D"font-family:Helvetica;font-size:medium;font-style:normal;font=
-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal=
;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:no=
rmal;word-spacing:0px;word-wrap:break-word">

<div style=3D"font-family:Helvetica;font-size:medium;font-style:normal;font=
-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal=
;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:no=
rmal;word-spacing:0px;word-wrap:break-word">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white=
-space:normal;word-spacing:0px;border-spacing:0px;font-size:medium"><div st=
yle=3D"word-wrap:break-word">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-indent:0px;text-transform:none;white-space:normal;word-spaci=
ng:0px;border-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-indent:0px;text-transform:none;white-space:normal;word-spaci=
ng:0px;border-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-indent:0px;text-transform:none;white-space:normal;word-spaci=
ng:0px;border-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-w=
ord">

<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com/" target=3D"_blank">http://w=
ww.thelastpickle.com</a></div>

</div></span></div></span></div></span></div></span></div></div>
</div>

<br><div><div>On 22/03/2013, at 7:54 AM, Arya Goudarzi &lt;<a href=3D"mailt=
o:goudarzi@gmail.com" target=3D"_blank">goudarzi@gmail.com</a>&gt; wrote:</=
div><br><blockquote type=3D"cite">Beside the joke, would hinted handoff rea=
lly have any role in this issue? I have been struggling to reproduce this i=
ssue using the snapshot data taken from our cluster and following the same =
upgrade process from 1.1.6 to 1.1.10. I know snapshots only link to active =
SSTables. What if these returned rows belong to some inactive SSTables and =
some bug exposed itself and marked them as active? What are the possibiliti=
es that could lead to this? I am eager to find our as this is blocking our =
upgrade.<br>


<br><div class=3D"gmail_quote">On Tue, Mar 19, 2013 at 2:11 AM,  <span dir=
=3D"ltr">&lt;<a href=3D"mailto:moshe.kranc@barclays.com" target=3D"_blank">=
moshe.kranc@barclays.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex">


<div lang=3D"EN-GB" link=3D"blue" vlink=3D"purple" style=3D"word-wrap:break=
-word"><div><p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-fam=
ily:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">This obscure =
feature of Cassandra is called =E2=80=9Chaunted handoff=E2=80=9D.<u></u><u>=
</u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></spa=
n></p><p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&q=
uot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Happy (early) April=
 Fools </span><span style=3D"font-size:11.0pt;font-family:Wingdings;color:#=
1f497d">J</span><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&q=
uot;,&quot;sans-serif&quot;;color:#1f497d"><u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></spa=
n></p><div><div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding=
:3.0pt 0cm 0cm 0cm">
<p class=3D"MsoNormal"><b><span lang=3D"EN-US" style=3D"font-size:10.0pt;fo=
nt-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span =
lang=3D"EN-US" style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&qu=
ot;sans-serif&quot;"> aaron morton [mailto:<a href=3D"mailto:aaron@thelastp=
ickle.com" target=3D"_blank">aaron@thelastpickle.com</a>] <br>


<b>Sent:</b> Monday, March 18, 2013 7:45 PM<br><b>To:</b> <a href=3D"mailto=
:user@cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>=
<br><b>Subject:</b> Re: Lots of Deleted Rows Came back after upgrade 1.1.6 =
to 1.1.10<u></u><u></u></span></p>


</div></div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><blockquote=
 style=3D"margin-top:5.0pt;margin-bottom:5.0pt"><p class=3D"MsoNormal">As y=
ou see, this node thinks lots of ranges are out of sync which shouldn&#39;t=
 be the case as successful repairs where done every night prior to the upgr=
ade.=C2=A0<u></u><u></u></p>


</blockquote><div><p class=3D"MsoNormal">Could this be explained by writes =
occurring during the upgrade process ?=C2=A0<u></u><u></u></p></div><div><p=
 class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><blockquote style=
=3D"margin-top:5.0pt;margin-bottom:5.0pt">
<p class=3D"MsoNormal">I found this bug which touches timestamp and tomston=
es which was fixed in 1.1.10 but am not 100% sure if it could be related to=
 this issue:=C2=A0<a href=3D"https://issues.apache.org/jira/browse/CASSANDR=
A-5153" target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-5=
153</a><u></u><u></u></p>


</blockquote><p class=3D"MsoNormal">Me neither, but the issue was fixed in =
1.1.0<u></u><u></u></p></div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></=
u></p></div><div><blockquote style=3D"margin-top:5.0pt;margin-bottom:5.0pt"=
><p class=3D"MsoNormal">


=C2=A0It appears that the repair task that I executed after upgrade, brough=
t back lots of deleted rows into life.<u></u><u></u></p></blockquote><p cla=
ss=3D"MsoNormal">Was it entire rows or columns in a row?<u></u><u></u></p><=
/div>


<div><p class=3D"MsoNormal">Do you know if row level or column level delete=
s were used ?=C2=A0<u></u><u></u></p></div><div><p class=3D"MsoNormal"><u><=
/u>=C2=A0<u></u></p></div><div><p class=3D"MsoNormal">Can you look at the d=
ata in cassanca-cli and confirm the timestamps on the columns make sense ? =
=C2=A0<u></u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><p cla=
ss=3D"MsoNormal">Cheers<u></u><u></u></p></div><div><p class=3D"MsoNormal">=
<u></u>=C2=A0<u></u></p></div><div><div><p class=3D"MsoNormal">
<span style=3D"font-size:13.5pt;font-family:&quot;Helvetica&quot;,&quot;san=
s-serif&quot;">-----------------<u></u><u></u></span></p></div><div><p clas=
s=3D"MsoNormal"><span style=3D"font-size:13.5pt;font-family:&quot;Helvetica=
&quot;,&quot;sans-serif&quot;">Aaron Morton<u></u><u></u></span></p>


</div><div><p class=3D"MsoNormal"><span style=3D"font-size:13.5pt;font-fami=
ly:&quot;Helvetica&quot;,&quot;sans-serif&quot;">Freelance Cassandra Consul=
tant<u></u><u></u></span></p></div><div><p class=3D"MsoNormal"><span style=
=3D"font-size:13.5pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quo=
t;">New Zealand<u></u><u></u></span></p>


</div><div><p class=3D"MsoNormal"><span style=3D"font-size:13.5pt;font-fami=
ly:&quot;Helvetica&quot;,&quot;sans-serif&quot;"><u></u>=C2=A0<u></u></span=
></p></div><div><p class=3D"MsoNormal"><span style=3D"font-size:13.5pt;font=
-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;">@aaronmorton<u></u><u=
></u></span></p>


</div><div><p class=3D"MsoNormal"><span style=3D"font-size:13.5pt;font-fami=
ly:&quot;Helvetica&quot;,&quot;sans-serif&quot;"><a href=3D"http://www.thel=
astpickle.com/" target=3D"_blank">http://www.thelastpickle.com</a><u></u><u=
></u></span></p>


</div></div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p><div><div><p cla=
ss=3D"MsoNormal">On 16/03/2013, at 2:31 PM, Arya Goudarzi &lt;<a href=3D"ma=
ilto:goudarzi@gmail.com" target=3D"_blank">goudarzi@gmail.com</a>&gt; wrote=
:<u></u><u></u></p>


</div><p class=3D"MsoNormal"><br><br><u></u><u></u></p><p class=3D"MsoNorma=
l">Hi,<u></u><u></u></p><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p=
></div><div><p class=3D"MsoNormal">I have upgraded our test cluster from 1.=
1.6 to 1.1.10. Followed by running repairs. It appears that the repair task=
 that I executed after upgrade, brought back lots of deleted rows into life=
. Here are some logistics:<u></u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><p cla=
ss=3D"MsoNormal">- The upgraded cluster started from 1.1.1 -&gt; 1.1.2 -&gt=
; 1.1.5 -&gt; 1.1.6=C2=A0<u></u><u></u></p></div><div><p class=3D"MsoNormal=
">- Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;<u></u><u>=
</u></p>


</div><div><p class=3D"MsoNormal">- Upgrade to : 1.1.10 with all other sett=
ings the same;<u></u><u></u></p></div><div><p class=3D"MsoNormal">- Success=
ful repairs were being done on this cluster every night;<u></u><u></u></p>

</div>
<div><p class=3D"MsoNormal">- Our clients use nanosecond precision timestam=
p for cassandra calls;<u></u><u></u></p></div><div><p class=3D"MsoNormal">-=
 After upgrade, while running repair I say some log messages like this in o=
ne node:<u></u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><div><=
p class=3D"MsoNormal">system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19=
:55:54,847 AntiEntropyService.java (line 1022) [repair #0990f320-8da9-11e2-=
0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /<a href=3D"http://23.20.207.56=
/" target=3D"_blank">23.20.207.56</a> have 2223 range(s) out of sync for Ap=
p<u></u><u></u></p>


</div><div><p class=3D"MsoNormal">system.log.5: INFO [AntiEntropyStage:1] 2=
013-03-15 19:55:54,877 AntiEntropyService.java (line 1022) [repair #0990f32=
0-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /<a href=3D"http://=
23.20.207.56/" target=3D"_blank">23.20.207.56</a> have 161 range(s) out of =
sync for App<u></u><u></u></p>


</div><div><p class=3D"MsoNormal">system.log.5: INFO [AntiEntropyStage:1] 2=
013-03-15 19:55:55,097 AntiEntropyService.java (line 1022) [repair #0990f32=
0-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /<a href=3D"http://=
23.20.250.43/" target=3D"_blank">23.20.250.43</a> have 2294 range(s) out of=
 sync for App<u></u><u></u></p>


</div><div><p class=3D"MsoNormal">system.log.5: INFO [AntiEntropyStage:1] 2=
013-03-15 19:55:59,190 AntiEntropyService.java (line 789) [repair #0990f320=
-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining column fami=
ly to sync for this session)<u></u><u></u></p>


</div></div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div>=
<p class=3D"MsoNormal">As you see, this node thinks lots of ranges are out =
of sync which shouldn&#39;t be the case as successful repairs where done ev=
ery night prior to the upgrade.=C2=A0<u></u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><p cla=
ss=3D"MsoNormal">The App CF uses SizeTiered with gc_grace of 10 days. It ha=
s caching =3D &#39;ALL&#39;, and it is fairly small (11Mb on each node).<u>=
</u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><p cla=
ss=3D"MsoNormal">I found this bug which touches timestamp and tomstones whi=
ch was fixed in 1.1.10 but am not 100% sure if it could be related to this =
issue:=C2=A0<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-5153=
" target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-5153</a=
><u></u><u></u></p>


</div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p></div><div><p cla=
ss=3D"MsoNormal">Any advice on how to dig deeper into this would be appreci=
ated.<u></u><u></u></p></div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></=
u></p></div>


<div><p class=3D"MsoNormal">Thanks,<u></u><u></u></p></div><div><p class=3D=
"MsoNormal">-Arya<u></u><u></u></p></div><div><p class=3D"MsoNormal"><u></u=
>=C2=A0<u></u></p></div><div><p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p=
></div></div>
<p class=3D"MsoNormal">
<u></u>=C2=A0<u></u></p></div></div><p>____________________________________=
___________</p><p>This message may contain information that is confidential=
 or privileged. If=20
you are not an intended recipient of this message, please delete it and any=
=20
attachments, and notify the sender that you have received it in error. Unle=
ss=20
specifically stated in the message or otherwise indicated, you may not=20
duplicate, redistribute or forward this message or any portion thereof,=20
including any attachments, by any means to any other person, including any=
=20
retail investor or customer. This message is not a recommendation, advice, =
offer=20
or solicitation, to buy/sell any product or service, and is not an official=
=20
confirmation of any transaction. Any opinions presented are solely those of=
 the=20
author and do not necessarily represent those of Barclays. This message is=
=20
subject to terms available at: <a href=3D"http://www.barclays.com/emaildisc=
laimer" target=3D"_blank">www.barclays.com/emaildisclaimer</a>=20
and, if received from Barclays&#39; Sales or Trading desk, the terms availa=
ble at:=20
<a href=3D"http://www.barclays.com/salesandtradingdisclaimer/" target=3D"_b=
lank">www.barclays.com/salesandtradingdisclaimer/</a>.=20
By messaging with Barclays you consent to the foregoing. Barclays Bank PLC =
is a=20
company registered in England (number 1026167) with its registered office a=
t 1=20
Churchill Place, London, E14 5HP. This email may relate to or be sent from =
other=20
members of the Barclays group.</p><p>______________________________________=
_________</p>
</div>
</blockquote></div><br>
</blockquote></div><br></div></div></blockquote></div><br></div></div></div=
></div></blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--001a11c2ba80373f9804d9573186--