Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-Id: <9F3DF474-9496-4927-BC5C-E55E221B31D1@voxeo.com>
From: Zhong Li <zli@voxeo.com>
To: user@cassandra.apache.org
In-Reply-To: <AANLkTinpVjjkFiftC3ZGtfEp7gCg793rLtam5kz9xWeq@mail.gmail.com>
Content-Type: multipart/alternative; boundary=Apple-Mail-174-997350421
Mime-Version: 1.0 (Apple Message framework v936)
Subject: Re: data deleted came back after 9 days.
Date: Tue, 17 Aug 2010 22:49:04 -0400
References: <63C838D2-E24B-4771-BE68-221D2AB03A39@voxeo.com>
 <5A566CA7-0ADB-48A1-9836-01016C7D3C8E@voxeo.com>
 <AANLkTinqgE9pQ87yT=awdRkpvhu=OeKx=BCCa396cTeA@mail.gmail.com>
 <BFF77E36-7253-4B69-9EA0-21E5450BBF7F@voxeo.com>
 <AANLkTikw3u2B85vShXnLf-qYuqKXD794ddcSLMVceL0W@mail.gmail.com>
 <AANLkTik4RR5LrUb0weXbLLL21Z0SBSTsZGPDBrp1CFQk@mail.gmail.com>
 <AANLkTinpVjjkFiftC3ZGtfEp7gCg793rLtam5kz9xWeq@mail.gmail.com>


--Apple-Mail-174-997350421
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit

Those data were inserted one node, then deleted on a remote node in  
less than 2 seconds. So it is very possible some node lost tombstone  
when connection lost.
My question, is a ConstencyLevel.ALL read can retrieve lost tombstone  
back instead of repair?


On Aug 17, 2010, at 4:11 PM, Ned Wolpert wrote:

> (gurus, please check my logic here... I'm trying to validate my  
> understanding of this situation.)
>
> Isn't the issue that while a server was disconnected, a delete could  
> have occurred, and thus the disconnected server never got the  
> 'tombstone'?
> (http://wiki.apache.org/cassandra/DistributedDeletes)  When it comes  
> back, only after it receives the delete request will the data be  
> deleted from the reconnected server.  I do not think this happens  
> automatically when the server rejoins the cluster, but requires the  
> manual repair command.
>
> From my understanding, if the consistency level is greater then the  
> number of servers missing that tombstone, you'll get the correct  
> data. If its less, then you 'could' get the right or wrong answer.  
> So the issue is how often do you need to run repair? If you have a  
> ReplicationFactor=3, and you use ConstencyLevel.QUORUM, (2  
> responses) then you need to run it after one server fails just to be  
> sure. If you can handle some tolerance for this, you can wait a bit  
> more before running the repair.
>
> On Tue, Aug 17, 2010 at 12:58 PM, Jeremy Dunck <jdunck@gmail.com>  
> wrote:
> On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis <jbellis@gmail.com>  
> wrote:
> > It doesn't have to be disconnected more than GC grace seconds to  
> cause
> > what you are seeing, it just has to be disconnected at all (thus
> > missing delete commands).
> >
> > Thus you need to be running repair more often than gcgrace, or
> > confident that read repair will handle it for you (which clearly is
> > not the case for you :).  see
> > http://wiki.apache.org/cassandra/Operations
>
> FWIW, the docs there say:
> "Remember though that if a node is down longer than your configured
> GCGraceSeconds (default: 10 days), it could have missed remove
> operations permanently"
>
> So that's probably a source of misunderstanding.
>
>
>
> -- 
> Virtually, Ned Wolpert
>
> "Settle thy studies, Faustus, and begin..."   --Marlowe


--Apple-Mail-174-997350421
Content-Type: text/html;
	charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><div>Those data were inserted =
one node, then deleted on a remote node in less than 2 seconds. So it is =
very possible some node lost tombstone when connection lost. =
&nbsp;</div><div>My question, is a ConstencyLevel.ALL read can =
retrieve&nbsp;lost&nbsp;tombstone back instead of repair? =
&nbsp;</div><div><br></div><div><br></div><div><br><div><div>On Aug 17, =
2010, at 4:11 PM, Ned Wolpert wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>(gurus,=
 please check my logic here... I'm trying to validate my understanding =
of this situation.)&nbsp;</div><div><br></div>Isn't the issue that while =
a server was disconnected, a delete could have occurred, and thus the =
disconnected server never got the 'tombstone'?<div> (<a =
href=3D"http://wiki.apache.org/cassandra/DistributedDeletes">http://wiki.a=
pache.org/cassandra/DistributedDeletes</a>) &nbsp;When it comes back, =
only after it receives the delete request will the data be deleted from =
the reconnected server. &nbsp;I do not think this happens automatically =
when the server rejoins the cluster, but requires the manual repair =
command.</div> <div><br></div><div>=46rom my understanding, if the =
consistency level is greater then the number of servers missing that =
tombstone, you'll get the correct data. If its less, then you 'could' =
get the right or wrong answer. So the issue is how often do you need to =
run repair? If you have a ReplicationFactor=3D3, and you use =
ConstencyLevel.QUORUM, (2 responses) then you need to run it after one =
server fails just to be sure. If you can handle =
some&nbsp;tolerance&nbsp;for this, you can wait a bit more before =
running the repair.</div> <div><br><div class=3D"gmail_quote">On Tue, =
Aug 17, 2010 at 12:58 PM, Jeremy Dunck <span dir=3D"ltr">&lt;<a =
href=3D"mailto:jdunck@gmail.com">jdunck@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex;"> On Tue, Aug 17, 2010 =
at 2:49 PM, Jonathan Ellis &lt;<a =
href=3D"mailto:jbellis@gmail.com">jbellis@gmail.com</a>&gt; wrote:<br> =
&gt; It doesn't have to be disconnected more than GC grace seconds to =
cause<br> &gt; what you are seeing, it just has to be disconnected at =
all (thus<br> &gt; missing delete commands).<br> &gt;<br> &gt; Thus you =
need to be running repair more often than gcgrace, or<br> &gt; confident =
that read repair will handle it for you (which clearly is<br> &gt; not =
the case for you :). &nbsp;see<br> &gt; <a =
href=3D"http://wiki.apache.org/cassandra/Operations" =
target=3D"_blank">http://wiki.apache.org/cassandra/Operations</a><br> =
<br> FWIW, the docs there say:<br> "Remember though that if a node is =
down longer than your configured<br> GCGraceSeconds (default: 10 days), =
it could have missed remove<br> operations permanently"<br> <br> So =
that's probably a source of misunderstanding.<br> =
</blockquote></div><br><br clear=3D"all"><br>-- <br>Virtually, Ned =
Wolpert<br><br>"Settle thy studies, Faustus, and begin..."&nbsp;&nbsp; =
--Marlowe<br> </div></blockquote></div><br></div></body></html>=

--Apple-Mail-174-997350421--