From user-return-13306-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Feb 12 22:37:32 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 27687 invoked from network); 12 Feb 2011 22:37:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2011 22:37:32 -0000 Received: (qmail 91037 invoked by uid 500); 12 Feb 2011 22:37:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90943 invoked by uid 500); 12 Feb 2011 22:37:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90935 invoked by uid 99); 12 Feb 2011 22:37:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 22:37:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of augustyn.michal@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 22:37:23 +0000 Received: by qwi2 with SMTP id 2so2483282qwi.31 for ; Sat, 12 Feb 2011 14:37:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=aSuYVYyW9+KF/r754pjKMYaV4nVvCXLU5ggSqPymBn8=; b=cSd/8TiAyCGUFDd+F263jKaR7jnc9ntU6Eoky2a73M2/CcVENjxoJI6qM3RCDDGME7 F7CcAlX/PHS2+4zQDL9cfdOo/7lw34lu8UzHEw7+9V8COwAfm/r77Y2zlKkLDgbOp5Sq 7tfFwQaHMV46Q5RTmnnMatf+uysTnN2MIRy58= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=FGun7yGm1Ui6wVUi/KhBHl3CkJSsghL8NT1rF3fQr01jFvEqjkhZNP7u07qZH4Vhww 6+/sbvHSKyn3IhSNA7Ev0BSktZAXcjTDR5G1/Fwzy8p8PEFIvfR74nhDq+cgSiwleSfx hJqPOaISiK82k6VOi1lJUcdEJOF3gkwgCa9dg= MIME-Version: 1.0 Received: by 10.229.91.3 with SMTP id k3mr1726857qcm.84.1297550221738; Sat, 12 Feb 2011 14:37:01 -0800 (PST) Received: by 10.229.246.201 with HTTP; Sat, 12 Feb 2011 14:37:01 -0800 (PST) In-Reply-To: <4d570708.4407dc0a.2c44.1110@mx.google.com> References: <4d570708.4407dc0a.2c44.1110@mx.google.com> Date: Sat, 12 Feb 2011 23:37:01 +0100 Message-ID: Subject: Re: per-connection "read-after-my-write" consistency From: =?ISO-8859-1?Q?Michal_August=FDn?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636426c610b812b049c1d7430 X-Virus-Checked: Checked by ClamAV on apache.org --001636426c610b812b049c1d7430 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, I'm using .NET and I wrote my own client library (over Thrift) so I'm absolutely sure that both operations are performed using the same connection. I can handle the current issue in application but I'm sure that I will not be able to handle some future situation in application. So the suggestion is to use at least 3 nodes with RF=3D3 and CL.QUORUM for both write and reads where high consistency is required, right? Thanks! 2011/2/12 Dan Hendry > Are you using a higher level client (hector/pelops/pycassa/etc) or the > actual thrift API? Higher level clients often pool connections and two > subsequent operations (read then write) may be performed with connections= to > different nodes. > > > > If you are sure you are using the same connection (the actual thrift api)= , > there is a possible race condition. To the best of my understanding, here= is > how a write happens at cl ONE in your case : > > - You make a request to node A which initiates a write to node A > and B > > - The server reports successful when the write to node A OR B is > complete (can somebody else confirm?) > > > > Typically the write to A will complete quicker since that is the node you > are connected to and there is additional network overhead initiating the > write on node B. I suppose a 1:1000 chance of B completing first is > possible, particularly if all nodes and the client are on the same networ= k > (or same machine) with very low latencies. > > > > Cassandra allows you to explicitly specify the trade-off between > consistency and availability. When you read and write at ONE with RF=3D2, > consistency is not guaranteed but high availability is (you can lose a no= de > and continue to operate). If you require strong consistency you will eith= er > have to read or write at consistency level ALL. My suggestion is to eithe= r > design your application to tolerate inconsistency (if possible) or move t= o > RF=3D3 and quorum read and quorum writes. > > > > Dan > > > > *From:* Michal August=FDn [mailto:augustyn.michal@gmail.com] > *Sent:* February-12-11 4:13 > *To:* user@cassandra.apache.org > *Subject:* per-connection "read-after-my-write" consistency > > > > Hi, > > > > I'm running 2 nodes with RF=3D2 (not optimal, I know), Cassandra 0.7.1. > > > > During one connection, I write (CL.ONE) a row and subsequently read > (CL.ONE) the same row (via Thrift). > > I supposed that if I write row to one node then I can immediately read th= is > row from this node. > > It seems to be true for most cases, but circa 1 of 1000 attempts doesn't > work as expected - I get no row :( > > > > Where is the problem please? Should I use another CL for read and/or writ= e? > I would like just to achieve "per connection read-after-my-write > consistency". > > > > Thank you very much! > > > > Augi > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11 > 02:34:00 > --001636426c610b812b049c1d7430 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

I'm using .NET and I wrote my own client library= (over Thrift) so I'm absolutely sure that both operations are performe= d using the same connection.
I can handle the current issue in ap= plication but I'm sure that I will not be able to handle some future si= tuation in application.

So the suggestion is to use at least 3 nodes with RF=3D= 3 and CL.QUORUM for both write and reads where high consistency is required= , right?

Thanks!

2011/2/12 Dan Hendry <dan.hendry.junk@gmail.com>

Are you using a higher level client (hector/pelops/pyc= assa/etc) or the actual thrift API? Higher level clients often pool connect= ions and two subsequent operations (read then write) may be performed with = connections to different nodes.

=A0

If you are sure you are using the same connection (the actual thrift ap= i), there is a possible race condition. To the best of my understanding, he= re is how a write happens at cl ONE in your case :

-=A0=A0=A0=A0=A0=A0=A0=A0=A0 You make a reques= t to node A which initiates a write to node A and B

-=A0=A0=A0=A0=A0=A0=A0=A0=A0 The server report= s successful when the write to node A OR B is complete (can somebody else c= onfirm?)

=A0

Typically the write to A will complete qui= cker since that is the node you are connected to and there is additional ne= twork overhead initiating the write on node B. I suppose a 1:1000 chance of= B completing first is possible, particularly if all nodes and the client a= re on the same network (or same machine) with very low latencies.

=A0

Cassandra allows you to explicitly specify the trade-off between consis= tency and availability. When you read and write at ONE with RF=3D2, consist= ency is not guaranteed but high availability is (you can lose a node and co= ntinue to operate). If you require strong consistency you will either have = to read or write at consistency level ALL. My suggestion is to either desig= n your application to tolerate inconsistency (if possible) or move to RF=3D= 3 and quorum read and quorum writes.

=A0

Dan

=A0

From: Mi= chal August=FDn [mailto:augustyn.michal@gmail.com]
Sent: February-12-11 4:13
To: user@cassandra.apache.org
Subj= ect: per-connection "read-after-my-write" consistency
<= /p>

=A0

Hi,

=A0

I'm running 2 nodes with RF=3D2 (not optimal, I know= ), Cassandra 0.7.1.

=A0

= During one connection, I write (CL.ONE) a row and subsequently read (CL.ONE= ) the same row (via Thrift).

I suppose= d that if I write row to one node then I can immediately read this row from= this node.

It seems to be true for most cases, but c= irca 1 of 1000 attempts doesn't work as expected - I get no row :(

<= /div>

=A0

W= here is the problem please? Should I use another CL for read and/or write? = I would like just to achieve "per connection read-after-my-write consi= stency".

=A0

= Thank you very much!

=A0

Augi

No virus found in this incoming message.
Checked by AVG - www.avg.c= om
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 0= 2/12/11 02:34:00


--001636426c610b812b049c1d7430--