Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 2432 invoked from network); 9 Jun 2010 09:09:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Jun 2010 09:09:32 -0000 Received: (qmail 34380 invoked by uid 500); 9 Jun 2010 09:09:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34321 invoked by uid 500); 9 Jun 2010 09:09:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34313 invoked by uid 99); 9 Jun 2010 09:09:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jun 2010 09:09:29 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of joolski@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jun 2010 09:09:24 +0000 Received: by wyb32 with SMTP id 32so1447709wyb.31 for ; Wed, 09 Jun 2010 02:09:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=MWk/eyy0+kvdHAYHJttUaK/Fq/9WRRlyoPrYi9lX5dw=; b=gt99D44S+nMBsdqQU2iM6h8pABoxCznP1m6C6BgXguq7qIFTIgf5LNRBBFlEvb9l0L jXdbgSBF1w3J0akw+ZDUtbH8S4mGLS3YZ9AUCxWtlRcdaaPkcVv/0V6mfNGc9JoCOdcY 2OtYA6NY2Z4YDk4SU6L8qDXRnR3Ih90amKka4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=uhNiiObCHQ04+Wn7WID3ry+O5xfyT7+BEpOZ1zkbJIp4hnP7289cXss3GuisX5yrrd EVbgtHTdA1+h2597nzAlzJXuoxA7SOJDA/dEDVLanxbC/AA0K6fwN+Q4K9xhsNdsRTla mEH/6yuawYNQCoz6Od6daUEb44V12TW+LzRhs= MIME-Version: 1.0 Received: by 10.216.169.199 with SMTP id n49mr4699916wel.42.1276074541212; Wed, 09 Jun 2010 02:09:01 -0700 (PDT) Received: by 10.216.72.131 with HTTP; Wed, 9 Jun 2010 02:09:01 -0700 (PDT) In-Reply-To: <1OMH23-0003qP-UN@mail.eleven.de> References: <1OMH23-0003qP-UN@mail.eleven.de> Date: Wed, 9 Jun 2010 10:09:01 +0100 Message-ID: Subject: Re: Inserting new data, where the key points to a tombstone record. From: Jools To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e64c39e6bc646f04889541c2 --0016e64c39e6bc646f04889541c2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Martin, Many thanks for the succinct, and clear response. I've got some pointers to move me in the right direction, many thanks. However, as a final point of clarification, is there a particular reason that insert does not raise an exception when trying to insert over an existing key, or when the key points to a tombstone record ? --Jools On 9 June 2010 09:53, Dr. Martin Grabm=FCller wrote: > Hi Jools, > > what happens in Cassandra with your scenario is the following: > > 1) insert new record > -> the record is added to Cassandra's dataset (with the given timestamp= ) > > 2) delete record > -> a tombstone is added to the data set (with the timestamp of the > deletion, > which should be larger than the timestamp in 1), otherwise, the > delete > will be lost. > > 3) insert new record with same key as deleted record > -> the record is added as in 2), but the timestamp should be larger tha= n > the timestamps from both 1) and 2) > > When you compact between 2) and 3), the record inserted at 1) will be > thrown > away, but the tombstone from 2) will not be thrown away *unless* the > tombstone > was created more than GCGraceSeconds (a configuration option) before the > compaction. > > If you do not compact, all records and tombstone will be present in > Cassandra's > dataset, and each read operation checks which of the records has the > highest > timestamp before returning the most current record (or report an error, i= f > the tombstone > has the highest timestamp). > > So whether you compact or not does not make a difference for your scenari= o, > as long as all replicas see the tombstone before GCGraceSeconds have > elapsed. > If that is the case, it is possible that deleted records come alive again= , > because > tombstones are deleted before all replicas had a chance to remove the > deleted > record. > > Your question about concurrently inserting the same key from different > clients > is another beast. The simple answer is: don't do it. > > The longer answer: either you use some external synchronisation mechanism > (e.g., Zookeeper), or you make sure that all clients use disjoint keys > (UUIDs, or > keys derived from the clients IP address+timestamp, that sort of thing). > > For keys representing user accounts or something similar, I would recomme= nd > using an external synchronisation mechanism, because for actions like > account > registration latency caused by such a mechanism is usually not a problem. > > For data coming in quickly, where the overhead of synchronisation is not > acceptable, > use the UUID variant and reconcile the data on read. > > HTH, > Martin > > ------------------------------ > *From:* Jools [mailto:joolski@gmail.com] > *Sent:* Wednesday, June 09, 2010 10:39 AM > *To:* user@cassandra.apache.org > *Subject:* Inserting new data, where the key points to a tombstone record= . > > > Hi, > > I've been developing a system against cassandra over the last few weeks, > and I'd like to ask the community some advice on the best way to deal wit= h > inserting new data where the key is currently a tombstone record. > > As with all distributed systems, this is always a tricky thing to deal > with, so I though I'd throw it to a wider audience. > > 1) insert new record. > 2) deleted record. > 3) insert record with same key as deleted record. > > Now I know I can make this work if I flush and compact between 2 and 3. > However, I don't want to rely on a flush and compact and I'd like to code > defensively against this senario, and I've ended up looking up to see if = the > key exists, then if it does then I know I can't insert the data. However,= if > the key does not exist then I attempt an insert. > > Now, here lies the issue. If I have more than one client doing this at th= e > same time, both trying to insert using the same key. One will succeed and > ones will fail. However neither insert will give me an indication of whic= h > one actually succeeded. > > So should an insert against an existing key, or deleted key produce some > kind of exception ? > > Cheers, > > --Jools > > > --0016e64c39e6bc646f04889541c2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hi Martin,

Many thanks for the= succinct, and clear response.=A0

I've got som= e pointers to move me in the right direction, many thanks.

However, as a final point of clarification, is there a particular reas= on that insert does not raise an exception when trying to insert over an ex= isting key, or when the key points to a tombstone record ?

--Jools



O= n 9 June 2010 09:53, Dr. Martin Grabm=FCller <Martin.Grabmueller@eleven.de&g= t; wrote:
Hi Jools,
=A0
what happens in Cassandra with your scenario is the=20 following:
=A0
1) insert new record
=A0 -> the record is added to Cassandra's dataset (with= =20 the given timestamp)
=A0
2) delete record
=A0 -> a tombstone is added to the data set (with the=20 timestamp of the deletion,
=A0=A0=A0=A0=A0 which should be larger than the=20 timestamp in 1), otherwise, the delete
=A0=A0=A0=A0=A0 will be=20 lost.
=A0
3) insert new record with same key as deleted=20 record
=A0 -> the record is added as in 2), but the timestamp=20 should be larger than
=A0=A0=A0=A0 the timestamps from both 1) and=20 2)
=A0
When you compact between 2) and 3), the record inserted at 1)= =20 will be thrown
away, but the tombstone from 2) will not be thrown away=20 *unless* the tombstone
was created more than GCGraceSeconds (a configuration option)= =20 before the
compaction.
=A0
If you do not compact, all records and tombstone will be=20 present in Cassandra's
dataset, and each read operation checks which of the records= =20 has the highest
timestamp before returning the most current record (or report= =20 an error, if the tombstone
has the highest timestamp).
=A0
So whether you compact or not does not make a difference for= =20 your scenario,
as long as all replicas see the tombstone before=20 GCGraceSeconds have elapsed.
If that is the case, it is possible that deleted records come= =20 alive again, because
tombstones are deleted before all replicas had a chance to=20 remove the deleted
record.
=A0
Your question about concurrently inserting the same key from= =20 different clients
is another beast.=A0 The simple answer is: don't do=20 it.
=A0
The longer answer: either you use some external=20 synchronisation mechanism
(e.g., Zookeeper), or you make sure that all clients use=20 disjoint keys (UUIDs, or
keys derived from the clients IP address+timestamp, that sort= =20 of thing).
=A0
For keys representing user accounts or something similar, I=20 would recommend
using an external synchronisation mechanism, because for=20 actions like account
registration latency caused by such a mechanism is usually not= =20 a problem.
=A0
For data coming in quickly, where the overhead of=20 synchronisation is not acceptable,
use the UUID variant and reconcile the data on=20 read.
=A0
HTH,
=A0 Martin


From: Jools [mailto:joolski@gmail.com]=20
Sent: Wednesday, June 09, 2010 10:39 AM
To:=20 user@cassa= ndra.apache.org
Subject: Inserting new data, where the key=20 points to a tombstone record.


Hi,

I've been developing a system against cassandra over the last fe= w weeks,=20 and I'd like to ask the community some advice on the best way to deal= with=20 inserting new data where the key is currently a tombstone record.

As with all distributed systems, this is always a tricky thing to de= al=20 with, so I though I'd throw it to a wider audience.

1) insert new record.
2) deleted record.
3) insert record with same key as deleted record.

Now I know I can make this work if I flush and compact between 2 and= 3.=20 However, I don't want to rely on a flush and compact and I'd like= to code=20 defensively against this senario, and I've ended up looking up to see= if the=20 key exists, then if it does then I know I can't insert the data. Howe= ver, if=20 the key does not exist then I attempt an insert.

Now, here lies the issue. If I have more than one client doing this = at=20 the same time, both trying to insert using the same key. One will succeed= and=20 ones will fail. However neither insert will give me an indication of whic= h one=20 actually succeeded.

So should an insert against an existing key, or deleted key produce = some=20 kind of exception ?=A0

Cheers,

--Jools



--0016e64c39e6bc646f04889541c2--