Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C641CC910 for ; Thu, 24 May 2012 09:40:00 +0000 (UTC) Received: (qmail 57865 invoked by uid 500); 24 May 2012 09:39:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57842 invoked by uid 500); 24 May 2012 09:39:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57833 invoked by uid 99); 24 May 2012 09:39:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 09:39:58 +0000 X-ASF-Spam-Status: No, hits=3.6 required=5.0 tests=FS_REPLICA,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a79.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 09:39:53 +0000 Received: from homiemail-a79.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTP id CB9E47D406C for ; Thu, 24 May 2012 02:39:30 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=4TQ07cC0SVSGAJtmosKocYL+8dohsTn5kXIiWe0EbXK wfuv8et7sXOWSAhsTwFYSA9vOAbp0jGuTy1ndfSbX3eu/Q5GO8OpXru8eRNVTmn4 Vw/cy3xW4b94vpR/xsq4jnTiyDbSFZn0Aasdb10LK3lOprin7JACQzI6FCzUgoEc = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=mZ/yrPjyTCzGizsdqO6kcVp+iMg=; b=rP1fb5Un+L 7xMdLdqRrlcHupbhOLAopBecHnx6RmoV+K/CxKB32YO2OFrXuFq0zJNlW98iDqpk fyqQqLNOPBVRZ6PxncIzJDtBd61KB4p3Lf2Lt4SZ2hPZcJoJysZbmYltaC6+QFUw lcKv9Sa+e6ak8VGE0aHeO92TomzNIHLHM= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTPSA id 4FC297D4059 for ; Thu, 24 May 2012 02:39:30 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: Confusion regarding the terms "replica" and "replication factor" From: aaron morton In-Reply-To: Date: Thu, 24 May 2012 21:39:27 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <66E4F2A3-73FA-4A09-A2F6-826D33A4F340@thelastpickle.com> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org This is partly historical. NTS (as it is now) has not always existed and = was not always the default. In days gone by used to be a fella could run = a mighty fine key-value store using just a Simple Replication Strategy.=20= A different way to visualise it is a single ring with a Z axis for the = DC's. When you look at the ring from the top you can see all the nodes. = When you look at it from the side you can see the nodes are on levels = that correspond to their DC. Simple Strategy looks at the ring from the = top. NTS works through the layers of the ring.=20 > If the hierarchy is Cluster -> > DataCenter -> Node, why exactly do we need globally unique node tokens > even though nodes are at the lowest level in the hierarchy. Nodes having a DC is a feature of *some* snitches and utilised by the = *some* of the replication strategies (and by the messaging system for = network efficiency). For background, mapping from row tokens to nodes is = based on http://en.wikipedia.org/wiki/Consistent_hashing Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/05/2012, at 1:07 AM, java jalwa wrote: > Thanks Aaron. That makes things clear. > So I guess the 0 - 2^127 range for tokens corresponds to a cluster > -level top-level ring. and then you add some logic on top of that with > NTS to logically segment that range into sub-rings as per the notion > of data clusters defined in NTS. Whats the advantage of having a > single top-level ring ? intuitively it seems like each replication > group could have a separate ring so that the same tokens can be > assigned to nodes in different DC. If the hierarchy is Cluster -> > DataCenter -> Node, why exactly do we need globally unique node tokens > even though nodes are at the lowest level in the hierarchy. >=20 > Thanks again. >=20 >=20 > On Wed, May 23, 2012 at 3:14 AM, aaron morton = wrote: >>> Now if a row key hash is mapped to a range owned by a node in DC3, >>> will the Node in DC3 still store the key as determined by the >>> partitioner and then walk the ring and store 2 replicas each in DC1 >>> and DC2 ? >> No, only nodes in the DC's specified in the NTS configuration will be = replicas. >>=20 >>> Or will the co-ordinator node be aware of the >>> replica placement strategy, >>> and override the partitioner's decision and walk the ring until it >>> first encounters a node in DC1 or DC2 ? and then place the remaining >>> replicas ? >> The NTS considers each DC to have it's own ring. This can make token = selection in a multi DC environment confusing at times. There is = something in the DS docs about it. >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 23/05/2012, at 3:16 PM, java jalwa wrote: >>=20 >>> Hi all, >>> I am a bit confused regarding the terms "replica" and >>> "replication factor". Assume that I am using RandomPartitioner and >>> NetworkTopologyStrategy for replica placement. >>> =46rom what I understand, with a RandomPartitioner, a row key will >>> always be hashed and be stored on the node that owns the range to >>> which the key is mapped. >>> = http://www.datastax.com/docs/1.0/cluster_architecture/replication#networkt= opologystrategy. >>> The example here, talks about having 2 data centers and a = replication >>> factor of 4 with 2 replicas in each datacenter, so the strategy is >>> configured as DC1:2 and DC2:2. Now suppose I add another datacenter >>> DC3, and do not change the NetworkTopologyStrategy. >>> Now if a row key hash is mapped to a range owned by a node in DC3, >>> will the Node in DC3 still store the key as determined by the >>> partitioner and then walk the ring and store 2 replicas each in DC1 >>> and DC2 ? Will that mean that I will then have 5 replicas in the >>> cluster and not 4 ? Or will the co-ordinator node be aware of the >>> replica placement strategy, >>> and override the partitioner's decision and walk the ring until it >>> first encounters a node in DC1 or DC2 ? and then place the remaining >>> replicas ? >>>=20 >>> Thanks. >>=20