Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40608 invoked from network); 27 Oct 2010 00:38:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Oct 2010 00:38:40 -0000 Received: (qmail 38358 invoked by uid 500); 27 Oct 2010 00:38:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38328 invoked by uid 500); 27 Oct 2010 00:38:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38320 invoked by uid 99); 27 Oct 2010 00:38:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 00:38:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a44.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 00:38:32 +0000 Received: from homiemail-a44.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a44.g.dreamhost.com (Postfix) with ESMTP id 44242118064 for ; Tue, 26 Oct 2010 17:38:10 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=to:from :subject:date:message-id:content-type:mime-version:in-reply-to; q=dns; s=thelastpickle.com; b=0eQ8nrmuaRtdfIIo0PVSa24EIMeeT23Zw Z22GNYqAyLRkEPAX4XZvOpSjzwCltbF2+V8K6ou6FIU2E4sYZMBN+Mb1n6MxOnaM 5xVoXLEyuK2Bv/C4wAmpz0weFpF7/JHfRIIHkf3aZjEqDtuImka/InkhKdaV+F2o pl/YYLqT8Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=to :from:subject:date:message-id:content-type:mime-version: in-reply-to; s=thelastpickle.com; bh=0iNHFGK6AgiQh99bhDSuBvQ1ktw =; b=t5EGVPm2Zb2bOn5+GUgCbBS8cbLBhyLlC2KTIF3CzWd2i3FMTiWtYyKjH0s ichtjrpSNoo/1Vh0nIlJCxVHfAaCwWx6h34AvA1XiVVLP8t7mRolt+DQttgt4wIp KrIHbmJKsf4rIAUPU743ELL2siDr1fuJZC70G8bPHPKlu8o8= Received: from localhost (webms.mac.com [17.148.16.116]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a44.g.dreamhost.com (Postfix) with ESMTPSA id 268A411805C for ; Tue, 26 Oct 2010 17:38:10 -0700 (PDT) To: user@cassandra.apache.org From: Aaron Morton Subject: Re: How to get the result from the closest node Date: Wed, 27 Oct 2010 00:38:09 GMT X-Mailer: MobileMe Mail (1C3205) Message-id: <94a8a5b4-5f0a-6458-e565-acfd2fd39329@me.com> Content-Type: multipart/alternative; boundary=Apple-Webmail-42--35549a52-c261-698a-97a1-fcb8ff1faa4d MIME-Version: 1.0 In-Reply-To: --Apple-Webmail-42--35549a52-c261-698a-97a1-fcb8ff1faa4d Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; format=flowed Lets start with the simple case, all nodes have the same proximity to each= other.=0A=0AThe client connects to a random node, called the coordinator.= When a request is made the=A0coordinator=A0asynchronously=A0sends it to a= ll nodes that are a replica for the requested key. It waits for the respon= se, and in the best case of a read is able to return the data to the clien= t as soon as CL nodes respond.=A0=0A=0AThe client does not have any=A0know= ledge=A0of where the data is located in the cluster. Thats the job of the = coordinator and it takes only 1 hop to get to each replica.=A0=0A=0AThe=A0= consistency=A0check is between the full data returned from one node and a = digest returned from the others. If it fails then RR will kick in (under C= L > ONE, if CL ONE is probabilistic).=A0=0A=0ARead about the RackAwareStra= tegy now=A0NetworkTopologyStrategy discussed here=A0http://wiki.apache.org= /cassandra/Operations?highlight=3D(network)|(strategy)=A0and the Snitch. A= lso=A0http://wiki.apache.org/cassandra/StorageConfiguration?highlight=3D(s= trategy)|(replication)=A0These features let you tell cassandra about your = topology.=0A=0AYou will then want to use something like=A0DCQUORUM or=A0DC= QUORUMSYNC (0.7+ AFAIK) for your requests.=0A=0AHope that helps.=0AAaron=0A= =0AOn 27 Oct, 2010,at 11:58 AM, Joe Alex wrote:=0A=0A= Hi,=0A=0AI have Cassandra 0.6.6 running on 4 nodes with RF=3D2.=0A=0ALet s= ay nodes A, B, C, D=0A=0AIf I have clients A1, B1, C1, D1 connected to res= pective nodes what=0Ahappens when A1 requests A for a key "100" for which = D is responsible=0Aas per the Token. C has the second copy.=0AAs per the l= ogs A1 requests A which requests D and gets the data. D=0Aalso checks a co= nsistency check in the background on C.=0AIf I have RF=3D3 I assume D will= do 2 consistency checks.=0A=0AIf I need to get the data from A itself wit= h minimum latency and=0Anetwork traversal between Data centers is this wha= t I need to do ?=0A=0A1. maybe RF=3D4 or at least >=3D 3=0A2. Adjust Read = Consistency (ONE, QUORUM, DCQUORUM...)=0A3. Use RackAware strategy with DC= QUORUM=0A3. Adjust Write Consistency=0A=0AIs there a way to get/write the = data from the closest node - example A=0Ais in NY, D in London etc.=0AFor = above example key=3D100. A1 calls A and A gets the data all the way from D= =0AAlso when A1 writes key=3D100 data needs to be written in D and C by A= =0A=0AProbably need RF=3D4 for this in combination with DCQUORUM or ANY/ON= E ?=0AWant to know how everybody is approaching this cases ?=0A=0A=0AA=0AD= EBUG [pool-1-thread-21] 2010-10-26 18:29:25,231 CassandraServer.java=0A(li= ne 216) get_slice=0ADEBUG [pool-1-thread-21] 2010-10-26 18:29:25,231 Stora= geProxy.java=0A(line 386) weakread reading SliceFromReadCommand(table=3D'K= eyspace1',=0Akey=3D'100', column_parent=3D'QueryPath(columnFamilyName=3D'S= tandard2',=0AsuperColumnName=3D'null', columnName=3D'null')', start=3D'', = finish=3D'',=0Areversed=3Dtrue, count=3D1000000) from 1311748@/10.210.32.9= 2=0ADEBUG [RESPONSE-STAGE:2] 2010-10-26 18:29:25,234=0AResponseVerbHandler= java (line 52) Processing response on an async=0Aresult from 1311748@/10.= 210.32.92=0ADEBUG [Timer-1] 2010-10-26 18:29:26,511 LoadDisseminator.java = (line=0A36) Disseminating load info ...=0A=0AD=0ADEBUG [ROW-READ-STAGE:5] = 2010-10-26 18:29:19,415 SliceQueryFilter.java=0A(line 116) collecting midd= le:false:1@1288128381467000=0ADEBUG [ROW-READ-STAGE:5] 2010-10-26 18:29:19= ,415 SliceQueryFilter.java=0A(line 116) collecting last:false:3@1288128369= 639000=0ADEBUG [ROW-READ-STAGE:5] 2010-10-26 18:29:19,415 SliceQueryFilter= java=0A(line 116) collecting first:false:4@1288128358062000=0ADEBUG [ROW-= READ-STAGE:5] 2010-10-26 18:29:19,415 ReadVerbHandler.java=0A(line 93) Rea= d key 100; sending response to 1311748@/10.210.32.74=0ADEBUG [CONSISTENCY-= MANAGER:4] 2010-10-26 18:29:19,416=0AConsistencyChecker.java (line 73) Rea= ding consistency digest for 100=0Afrom 1081388@[/10.210.32.92, /10.210.32.= 93]=0ADEBUG [RESPONSE-STAGE:1] 2010-10-26 18:29:19,418=0AResponseVerbHandl= er.java (line 42) Processing response on a callback=0Afrom 1081388@/10.210= 32.93=0A=0AC=0ADEBUG [ROW-READ-STAGE:4] 2010-10-26 18:29:25,237 SliceQuer= yFilter.java=0A(line 116) collecting middle:false:1@1288128381467000=0ADEB= UG [ROW-READ-STAGE:4] 2010-10-26 18:29:25,238 SliceQueryFilter.java=0A(lin= e 116) collecting last:false:3@1288128369639000=0ADEBUG [ROW-READ-STAGE:4]= 2010-10-26 18:29:25,238 SliceQueryFilter.java=0A(line 116) collecting fir= st:false:4@1288128358062000=0ADEBUG [ROW-READ-STAGE:4] 2010-10-26 18:29:25= ,238 ReadVerbHandler.java=0A(line 75) digest is c1ba97c56693d7fe4cbb9ac054= 4034b3=0ADEBUG [ROW-READ-STAGE:4] 2010-10-26 18:29:25,238 ReadVerbHandler.= java=0A(line 93) Read key 100; sending response to 1081388@/10.210.32.92=0A --Apple-Webmail-42--35549a52-c261-698a-97a1-fcb8ff1faa4d Content-Type: multipart/related; type="text/html"; boundary=Apple-Webmail-86--35549a52-c261-698a-97a1-fcb8ff1faa4d --Apple-Webmail-86--35549a52-c261-698a-97a1-fcb8ff1faa4d Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1;
Lets start with the simple case, all nodes have the same proximity to= each other.

The client connects to a random node= , called the coordinator. When a request is made the coordinator = ;asynchronously sends it to all nodes that are a replica for the requ= ested key. It waits for the response, and in the best case of a read is ab= le to return the data to the client as soon as CL nodes respond. 

The client does not have any knowledge of = where the data is located in the cluster. Thats the job of the coordinator= and it takes only 1 hop to get to each replica. 

The consistency check is between the full data returned fr= om one node and a digest returned from the others. If it fails then RR wil= l kick in (under CL > ONE, if CL ONE is probabilistic). 

Read about the RackAwareStrategy now NetworkTopologyStrategy= discussed here http://wiki.apache.org/cassandra/Operat= ions?highlight=3D(network)|(strategy) and the Snitch. Also <= a href=3D"http://wiki.apache.org/cassandra/StorageConfiguration?highlight=3D= (strategy)|(replication)">http://wiki.apache.org/cassandra/StorageConfigur= ation?highlight=3D(strategy)|(replication) These features let you= tell cassandra about your topology.

Yo= u will then want to use something like DCQUORUM or DCQUORUMSYNC = (0.7+ AFAIK) for your requests.

Hope that helps.<= /div>
Aaron

On 27 Oct, 2010,at 11:58 AM, Joe Alex <jo= e.m.alex@gmail.com> wrote:

=
Hi,
=0A
=0AI have Cassandra 0.6.6 runni= ng on 4 nodes with RF=3D2.
=0A
=0ALet say nodes A, B, C, D
=0A=0AIf I have clients A1, B1, C1, D1 connected to respective nodes what=0Ahappens when A1 requests A for a key "100" for which D is responsible<= br>=0Aas per the Token. C has the second copy.
=0AAs per the logs A1 re= quests A which requests D and gets the data. D
=0Aalso checks a consist= ency check in the background on C.
=0AIf I have RF=3D3 I assume D will = do 2 consistency checks.
=0A
=0AIf I need to get the data from A its= elf with minimum latency and
=0Anetwork traversal between Data centers = is this what I need to do ?
=0A
=0A1. maybe RF=3D4 or at least >=3D= 3
=0A2. Adjust Read Consistency (ONE, QUORUM, DCQUORUM...)
=0A3. Us= e RackAware strategy with DCQUORUM
=0A3. Adjust Write Consistency
=0A=
=0AIs there a way to get/write the data from the closest node - exampl= e A
=0Ais in NY, D in London etc.
=0AFor above example key=3D100. A1= calls A and A gets the data all the way from D.
=0AAlso when A1 writes= key=3D100 data needs to be written in D and C by A
=0A
=0AProbably = need RF=3D4 for this in combination with DCQUORUM or ANY/ONE ?
=0AWant = to know how everybody is approaching this cases ?
=0A
=0A
=0AA=0ADEBUG [pool-1-thread-21] 2010-10-26 18:29:25,231 CassandraServer.java<= br>=0A(line 216) get_slice
=0ADEBUG [pool-1-thread-21] 2010-10-26 18:29= :25,231 StorageProxy.java
=0A(line 386) weakread reading SliceFromReadC= ommand(table=3D'Keyspace1',
=0Akey=3D'100', column_parent=3D'QueryPath(= columnFamilyName=3D'Standard2',
=0AsuperColumnName=3D'null', columnName= =3D'null')', start=3D'', finish=3D'',
=0Areversed=3Dtrue, count=3D10000= 00) from 1311748@/10.210.32.92
=0ADEBUG [RESPONSE-STAGE:2] 2010-10-26 1= 8:29:25,234
=0AResponseVerbHandler.java (line 52) Processing response o= n an async
=0Aresult from 1311748@/10.210.32.92
=0ADEBUG [Timer-1] 2= 010-10-26 18:29:26,511 LoadDisseminator.java (line
=0A36) Disseminating= load info ...
=0A
=0AD
=0ADEBUG [ROW-READ-STAGE:5] 2010-10-26 18= :29:19,415 SliceQueryFilter.java
=0A(line 116) collecting middle:false:= 1@1288128381467000
=0ADEBUG [ROW-READ-STAGE:5] 2010-10-26 18:29:19,415 = SliceQueryFilter.java
=0A(line 116) collecting last:false:3@12881283696= 39000
=0ADEBUG [ROW-READ-STAGE:5] 2010-10-26 18:29:19,415 SliceQueryFil= ter.java
=0A(line 116) collecting first:false:4@1288128358062000
=0A= DEBUG [ROW-READ-STAGE:5] 2010-10-26 18:29:19,415 ReadVerbHandler.java
=0A= (line 93) Read key 100; sending response to 1311748@/10.210.32.74
=0ADE= BUG [CONSISTENCY-MANAGER:4] 2010-10-26 18:29:19,416
=0AConsistencyCheck= er.java (line 73) Reading consistency digest for 100
=0Afrom 1081388@[/= 10.210.32.92, /10.210.32.93]
=0ADEBUG [RESPONSE-STAGE:1] 2010-10-26 18:= 29:19,418
=0AResponseVerbHandler.java (line 42) Processing response on = a callback
=0Afrom 1081388@/10.210.32.93
=0A
=0AC
=0ADEBUG [RO= W-READ-STAGE:4] 2010-10-26 18:29:25,237 SliceQueryFilter.java
=0A(line = 116) collecting middle:false:1@1288128381467000
=0ADEBUG [ROW-READ-STAG= E:4] 2010-10-26 18:29:25,238 SliceQueryFilter.java
=0A(line 116) collec= ting last:false:3@1288128369639000
=0ADEBUG [ROW-READ-STAGE:4] 2010-10-= 26 18:29:25,238 SliceQueryFilter.java
=0A(line 116) collecting first:fa= lse:4@1288128358062000
=0ADEBUG [ROW-READ-STAGE:4] 2010-10-26 18:29:25,= 238 ReadVerbHandler.java
=0A(line 75) digest is c1ba97c56693d7fe4cbb9ac= 0544034b3
=0ADEBUG [ROW-READ-STAGE:4] 2010-10-26 18:29:25,238 ReadVerbH= andler.java
=0A(line 93) Read key 100; sending response to 1081388@/10.= 210.32.92
=0A
--Apple-Webmail-86--35549a52-c261-698a-97a1-fcb8ff1faa4d-- --Apple-Webmail-42--35549a52-c261-698a-97a1-fcb8ff1faa4d--