Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of joolski@gmail.com designates
 74.125.82.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=uhNiiObCHQ04+Wn7WID3ry+O5xfyT7+BEpOZ1zkbJIp4hnP7289cXss3GuisX5yrrd
         EVbgtHTdA1+h2597nzAlzJXuoxA7SOJDA/dEDVLanxbC/AA0K6fwN+Q4K9xhsNdsRTla
         mEH/6yuawYNQCoz6Od6daUEb44V12TW+LzRhs=
MIME-Version: 1.0
In-Reply-To: <1OMH23-0003qP-UN@mail.eleven.de>
References: <AANLkTil7IMR0j41c27lWfFryYJCcT74RwqQIU5eZD7ok@mail.gmail.com>
	<1OMH23-0003qP-UN@mail.eleven.de>
Date: Wed, 9 Jun 2010 10:09:01 +0100
Message-ID: <AANLkTiknv_Koj_e-7o6cYGHkbuWpVJ5Vua8-hT8AAuWT@mail.gmail.com>
Subject: Re: Inserting new data, where the key points to a tombstone record.
From: Jools <joolski@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e64c39e6bc646f04889541c2

--0016e64c39e6bc646f04889541c2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Martin,

Many thanks for the succinct, and clear response.

I've got some pointers to move me in the right direction, many thanks.

However, as a final point of clarification, is there a particular reason
that insert does not raise an exception when trying to insert over an
existing key, or when the key points to a tombstone record ?

--Jools


On 9 June 2010 09:53, Dr. Martin Grabm=FCller <Martin.Grabmueller@eleven.de=
>wrote:

>  Hi Jools,
>
> what happens in Cassandra with your scenario is the following:
>
> 1) insert new record
>   -> the record is added to Cassandra's dataset (with the given timestamp=
)
>
> 2) delete record
>   -> a tombstone is added to the data set (with the timestamp of the
> deletion,
>       which should be larger than the timestamp in 1), otherwise, the
> delete
>       will be lost.
>
> 3) insert new record with same key as deleted record
>   -> the record is added as in 2), but the timestamp should be larger tha=
n
>      the timestamps from both 1) and 2)
>
> When you compact between 2) and 3), the record inserted at 1) will be
> thrown
> away, but the tombstone from 2) will not be thrown away *unless* the
> tombstone
> was created more than GCGraceSeconds (a configuration option) before the
> compaction.
>
> If you do not compact, all records and tombstone will be present in
> Cassandra's
> dataset, and each read operation checks which of the records has the
> highest
> timestamp before returning the most current record (or report an error, i=
f
> the tombstone
> has the highest timestamp).
>
> So whether you compact or not does not make a difference for your scenari=
o,
> as long as all replicas see the tombstone before GCGraceSeconds have
> elapsed.
> If that is the case, it is possible that deleted records come alive again=
,
> because
> tombstones are deleted before all replicas had a chance to remove the
> deleted
> record.
>
> Your question about concurrently inserting the same key from different
> clients
> is another beast.  The simple answer is: don't do it.
>
> The longer answer: either you use some external synchronisation mechanism
> (e.g., Zookeeper), or you make sure that all clients use disjoint keys
> (UUIDs, or
> keys derived from the clients IP address+timestamp, that sort of thing).
>
> For keys representing user accounts or something similar, I would recomme=
nd
> using an external synchronisation mechanism, because for actions like
> account
> registration latency caused by such a mechanism is usually not a problem.
>
> For data coming in quickly, where the overhead of synchronisation is not
> acceptable,
> use the UUID variant and reconcile the data on read.
>
> HTH,
>   Martin
>
>  ------------------------------
> *From:* Jools [mailto:joolski@gmail.com]
> *Sent:* Wednesday, June 09, 2010 10:39 AM
> *To:* user@cassandra.apache.org
> *Subject:* Inserting new data, where the key points to a tombstone record=
.
>
>
> Hi,
>
> I've been developing a system against cassandra over the last few weeks,
> and I'd like to ask the community some advice on the best way to deal wit=
h
> inserting new data where the key is currently a tombstone record.
>
> As with all distributed systems, this is always a tricky thing to deal
> with, so I though I'd throw it to a wider audience.
>
> 1) insert new record.
> 2) deleted record.
> 3) insert record with same key as deleted record.
>
> Now I know I can make this work if I flush and compact between 2 and 3.
> However, I don't want to rely on a flush and compact and I'd like to code
> defensively against this senario, and I've ended up looking up to see if =
the
> key exists, then if it does then I know I can't insert the data. However,=
 if
> the key does not exist then I attempt an insert.
>
> Now, here lies the issue. If I have more than one client doing this at th=
e
> same time, both trying to insert using the same key. One will succeed and
> ones will fail. However neither insert will give me an indication of whic=
h
> one actually succeeded.
>
> So should an insert against an existing key, or deleted key produce some
> kind of exception ?
>
> Cheers,
>
> --Jools
>
>
>

--0016e64c39e6bc646f04889541c2
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div><br></div><div>Hi Martin,</div><div><br></div><div>Many thanks for the=
 succinct, and clear response.=A0</div><div><br></div><div>I&#39;ve got som=
e pointers to move me in the right direction, many thanks.</div><div><br></=
div>
<div>However, as a final point of clarification, is there a particular reas=
on that insert does not raise an exception when trying to insert over an ex=
isting key, or when the key points to a tombstone record ?</div><div><br>
</div><div>--Jools</div><div><br></div><br><br><div class=3D"gmail_quote">O=
n 9 June 2010 09:53, Dr. Martin Grabm=FCller <span dir=3D"ltr">&lt;<a href=
=3D"mailto:Martin.Grabmueller@eleven.de">Martin.Grabmueller@eleven.de</a>&g=
t;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">


<div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">Hi Jools,</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">what happens in Cassandra with your scenario is the=20
following:</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">1) insert new record</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0 -&gt; the record is added to Cassandra&#39;s dataset (with=
=20
the given timestamp)</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">2) delete record</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0 -&gt; a tombstone is added to the data set (with the=20
timestamp of the deletion,</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0=A0=A0=A0=A0 which should be larger than the=20
timestamp in 1), otherwise, the delete</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0=A0=A0=A0=A0 will be=20
lost.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">3) insert new record with same key as deleted=20
record</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0 -&gt; the record is added as in 2), but the timestamp=20
should be larger than</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0=A0=A0=A0 the timestamps from both 1) and=20
2)</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">When you compact between 2) and 3), the record inserted at 1)=
=20
will be thrown</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">away, but the tombstone from 2) will not be thrown away=20
*unless* the tombstone</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">was created more than GCGraceSeconds (a configuration option)=
=20
before the</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">compaction.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">If you do not compact, all records and tombstone will be=20
present in Cassandra&#39;s</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">dataset, and each read operation checks which of the records=
=20
has the highest</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">timestamp before returning the most current record (or report=
=20
an error, if the tombstone</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">has the highest timestamp).</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">So whether you compact or not does not make a difference for=
=20
your scenario,</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">as long as all replicas see the tombstone before=20
GCGraceSeconds have elapsed.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">If that is the case, it is possible that deleted records come=
=20
alive again, because</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">tombstones are deleted before all replicas had a chance to=20
remove the deleted</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">record.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">Your question about concurrently inserting the same key from=
=20
different clients</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">is another beast.=A0 The simple answer is: don&#39;t do=20
it.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">The longer answer: either you use some external=20
synchronisation mechanism</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">(e.g., Zookeeper), or you make sure that all clients use=20
disjoint keys (UUIDs, or</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">keys derived from the clients IP address+timestamp, that sort=
=20
of thing).</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">For keys representing user accounts or something similar, I=20
would recommend</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">using an external synchronisation mechanism, because for=20
actions like account</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">registration latency caused by such a mechanism is usually not=
=20
a problem.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">For data coming in quickly, where the overhead of=20
synchronisation is not acceptable,</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">use the UUID variant and reconcile the data on=20
read.</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial"></font></span>=A0</div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">HTH,</font></span></div>
<div dir=3D"ltr" align=3D"left"><span><font color=3D"#0000ff" size=3D"2" fa=
ce=3D"Arial">=A0 Martin</font></span></div><br>
<blockquote style=3D"border-left:#0000ff 2px solid;padding-left:5px;margin-=
left:5px;margin-right:0px" dir=3D"ltr">
  <div dir=3D"ltr" lang=3D"de" align=3D"left">
  <hr>
  <font size=3D"2" face=3D"Tahoma"><b>From:</b> Jools [mailto:<a href=3D"ma=
ilto:joolski@gmail.com" target=3D"_blank">joolski@gmail.com</a>]=20
  <br><b>Sent:</b> Wednesday, June 09, 2010 10:39 AM<br><b>To:</b>=20
  <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassa=
ndra.apache.org</a><br><b>Subject:</b> Inserting new data, where the key=20
  points to a tombstone record.<br></font><br></div><div><div></div><div cl=
ass=3D"h5">
  <div></div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font=
 color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br>
  <div>Hi,</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>I&#39;ve been developing a system against cassandra over the last fe=
w weeks,=20
  and I&#39;d like to ask the community some advice on the best way to deal=
 with=20
  inserting new data where the key is currently a tombstone record.</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>As with all distributed systems, this is always a tricky thing to de=
al=20
  with, so I though I&#39;d throw it to a wider audience.</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>1) insert new record.</div>
  <div>2) deleted record.</div>
  <div>3) insert record with same key as deleted record.</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>Now I know I can make this work if I flush and compact between 2 and=
 3.=20
  However, I don&#39;t want to rely on a flush and compact and I&#39;d like=
 to code=20
  defensively against this senario, and I&#39;ve ended up looking up to see=
 if the=20
  key exists, then if it does then I know I can&#39;t insert the data. Howe=
ver, if=20
  the key does not exist then I attempt an insert.</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>Now, here lies the issue. If I have more than one client doing this =
at=20
  the same time, both trying to insert using the same key. One will succeed=
 and=20
  ones will fail. However neither insert will give me an indication of whic=
h one=20
  actually succeeded.</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>So should an insert against an existing key, or deleted key produce =
some=20
  kind of exception ?=A0</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>Cheers,</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div>--Jools</div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div>
  <div><font color=3D"#0000ff" size=3D"2" face=3D"Arial"></font><font color=
=3D"#0000ff" size=3D"2" face=3D"Arial"></font><br></div></div></div></block=
quote></div>
</blockquote></div><br>

--0016e64c39e6bc646f04889541c2--