Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 47614 invoked from network); 18 Aug 2010 02:50:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Aug 2010 02:50:03 -0000 Received: (qmail 32928 invoked by uid 500); 18 Aug 2010 02:50:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32783 invoked by uid 500); 18 Aug 2010 02:50:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32775 invoked by uid 99); 18 Aug 2010 02:49:59 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 02:49:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [66.193.54.208] (HELO voxeo.com) (66.193.54.208) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 02:49:34 +0000 Received: from [72.188.237.155] (account zli@voxeo.com HELO [192.168.10.3]) by voxeo.com (CommuniGate Pro SMTP 5.3.8) with ESMTPSA id 69667841 for user@cassandra.apache.org; Wed, 18 Aug 2010 02:49:11 +0000 Message-Id: <9F3DF474-9496-4927-BC5C-E55E221B31D1@voxeo.com> From: Zhong Li To: user@cassandra.apache.org In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-174-997350421 Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: data deleted came back after 9 days. Date: Tue, 17 Aug 2010 22:49:04 -0400 References: <63C838D2-E24B-4771-BE68-221D2AB03A39@voxeo.com> <5A566CA7-0ADB-48A1-9836-01016C7D3C8E@voxeo.com> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-174-997350421 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? On Aug 17, 2010, at 4:11 PM, Ned Wolpert wrote: > (gurus, please check my logic here... I'm trying to validate my > understanding of this situation.) > > Isn't the issue that while a server was disconnected, a delete could > have occurred, and thus the disconnected server never got the > 'tombstone'? > (http://wiki.apache.org/cassandra/DistributedDeletes) When it comes > back, only after it receives the delete request will the data be > deleted from the reconnected server. I do not think this happens > automatically when the server rejoins the cluster, but requires the > manual repair command. > > From my understanding, if the consistency level is greater then the > number of servers missing that tombstone, you'll get the correct > data. If its less, then you 'could' get the right or wrong answer. > So the issue is how often do you need to run repair? If you have a > ReplicationFactor=3, and you use ConstencyLevel.QUORUM, (2 > responses) then you need to run it after one server fails just to be > sure. If you can handle some tolerance for this, you can wait a bit > more before running the repair. > > On Tue, Aug 17, 2010 at 12:58 PM, Jeremy Dunck > wrote: > On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis > wrote: > > It doesn't have to be disconnected more than GC grace seconds to > cause > > what you are seeing, it just has to be disconnected at all (thus > > missing delete commands). > > > > Thus you need to be running repair more often than gcgrace, or > > confident that read repair will handle it for you (which clearly is > > not the case for you :). see > > http://wiki.apache.org/cassandra/Operations > > FWIW, the docs there say: > "Remember though that if a node is down longer than your configured > GCGraceSeconds (default: 10 days), it could have missed remove > operations permanently" > > So that's probably a source of misunderstanding. > > > > -- > Virtually, Ned Wolpert > > "Settle thy studies, Faustus, and begin..." --Marlowe --Apple-Mail-174-997350421 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable
Those data were inserted = one node, then deleted on a remote node in less than 2 seconds. So it is = very possible some node lost tombstone when connection lost. =  
My question, is a ConstencyLevel.ALL read can = retrieve lost tombstone back instead of repair? =  



On Aug 17, = 2010, at 4:11 PM, Ned Wolpert wrote:

(gurus,= please check my logic here... I'm trying to validate my understanding = of this situation.) 

Isn't the issue that while = a server was disconnected, a delete could have occurred, and thus the = disconnected server never got the 'tombstone'?
(http://wiki.a= pache.org/cassandra/DistributedDeletes)  When it comes back, = only after it receives the delete request will the data be deleted from = the reconnected server.  I do not think this happens automatically = when the server rejoins the cluster, but requires the manual repair = command.

=46rom my understanding, if the = consistency level is greater then the number of servers missing that = tombstone, you'll get the correct data. If its less, then you 'could' = get the right or wrong answer. So the issue is how often do you need to = run repair? If you have a ReplicationFactor=3D3, and you use = ConstencyLevel.QUORUM, (2 responses) then you need to run it after one = server fails just to be sure. If you can handle = some tolerance for this, you can wait a bit more before = running the repair.

On Tue, = Aug 17, 2010 at 12:58 PM, Jeremy Dunck <jdunck@gmail.com> = wrote:
On Tue, Aug 17, 2010 = at 2:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
= > It doesn't have to be disconnected more than GC grace seconds to = cause
> what you are seeing, it just has to be disconnected at = all (thus
> missing delete commands).
>
> Thus you = need to be running repair more often than gcgrace, or
> confident = that read repair will handle it for you (which clearly is
> not = the case for you :).  see
> http://wiki.apache.org/cassandra/Operations
=
FWIW, the docs there say:
"Remember though that if a node is = down longer than your configured
GCGraceSeconds (default: 10 days), = it could have missed remove
operations permanently"

So = that's probably a source of misunderstanding.
=



--
Virtually, Ned = Wolpert

"Settle thy studies, Faustus, and begin..."   = --Marlowe

= --Apple-Mail-174-997350421--