Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 502379903 for ; Fri, 20 Apr 2012 07:33:41 +0000 (UTC) Received: (qmail 44507 invoked by uid 500); 20 Apr 2012 07:33:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44429 invoked by uid 500); 20 Apr 2012 07:33:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44413 invoked by uid 99); 20 Apr 2012 07:33:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2012 07:33:37 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.32.180.16] (HELO va3outboundpool.messaging.microsoft.com) (216.32.180.16) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Apr 2012 07:33:30 +0000 Received: from mail191-va3-R.bigfish.com (10.7.14.239) by VA3EHSOBE004.bigfish.com (10.7.40.24) with Microsoft SMTP Server id 14.1.225.23; Fri, 20 Apr 2012 07:33:07 +0000 Received: from mail191-va3 (localhost [127.0.0.1]) by mail191-va3-R.bigfish.com (Postfix) with ESMTP id 9105730038A for ; Fri, 20 Apr 2012 07:33:07 +0000 (UTC) X-SpamScore: -3 X-BigFish: VPS-3(zz9371Ic85fh98dKzz1202hz31iz8275bh8275dhz2dh2a8h668h839hd25h) X-Forefront-Antispam-Report: CIP:157.56.252.133;KIP:(null);UIP:(null);IPV:NLI;H:DBXPRD0310HT001.eurprd03.prod.outlook.com;RD:none;EFVD:NLI Received: from mail191-va3 (localhost.localdomain [127.0.0.1]) by mail191-va3 (MessageSwitch) id 1334907185134528_5547; Fri, 20 Apr 2012 07:33:05 +0000 (UTC) Received: from VA3EHSMHS015.bigfish.com (unknown [10.7.14.245]) by mail191-va3.bigfish.com (Postfix) with ESMTP id 111DD4A006C for ; Fri, 20 Apr 2012 07:33:05 +0000 (UTC) Received: from DBXPRD0310HT001.eurprd03.prod.outlook.com (157.56.252.133) by VA3EHSMHS015.bigfish.com (10.7.99.25) with Microsoft SMTP Server (TLS) id 14.1.225.23; Fri, 20 Apr 2012 07:33:01 +0000 Received: from DBXPRD0310MB384.eurprd03.prod.outlook.com ([169.254.4.135]) by DBXPRD0310HT001.eurprd03.prod.outlook.com ([10.255.65.164]) with mapi id 14.16.0143.004; Fri, 20 Apr 2012 07:32:59 +0000 From: Richard Lowe To: "'user@cassandra.apache.org'" Subject: RE: default required in cassandra-topology.properties? Thread-Topic: default required in cassandra-topology.properties? Thread-Index: AQHNHkfENjHoG36/SE6gEwDP7Nk5aZaiYOLQgAAfGoCAAMqnQA== Date: Fri, 20 Apr 2012 07:32:59 +0000 Message-ID: <9FC471D82FE4194FA2B6A1403B475C9B08EE6BC5@DBXPRD0310MB384.eurprd03.prod.outlook.com> References: <9FC471D82FE4194FA2B6A1403B475C9B08EE573D@DBXPRD0310MB384.eurprd03.prod.outlook.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [87.115.169.225] Content-Type: multipart/alternative; boundary="_000_9FC471D82FE4194FA2B6A1403B475C9B08EE6BC5DBXPRD0310MB384_" MIME-Version: 1.0 X-OriginatorOrg: arkivum.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_9FC471D82FE4194FA2B6A1403B475C9B08EE6BC5DBXPRD0310MB384_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable As far as I know it's not possible to leave replication factor undefined - = if you do then Cassandra will default to RF=3D1 with SimpleStrategy. The topology is local to each node, so unless all your nodes have the same = topology file then it's possible for them each to have a different idea abo= ut the topology of the cluster. I'm not sure what you're trying to achieve here, so I'll give an example. Say you have two datacenters, DC1 and DC2. It's perfectly possible for node= s in DC1 to have a topology file that only mentions DC1 nodes and nodes in = DC2 to have a topology file that only mentions DC2 nodes. You can then defi= ne one keyspace with strategy options DC1: 3 and another with DC2: 3 and th= is should work fine. However if you had a keyspace with strategy options DC1: 3, DC2: 3 then you= would AFAIK never be able to write to that column family because none of t= he nodes know enough about the topology; they can either address DC1, or ad= dress DC2, but not both. If there were a third type of node that had topology defined for both DC1 a= nd DC2 then these nodes would then be able to update the DC1+DC2 keyspace, = even though DC1-only and DC2-only nodes would not. So if there is a clear segregation in your data then splitting the topology= may be OK, but if not then you will likely find that you can't update the = keyspace unless a node has sufficient knowledge of the topology. Depending on your use case a simpler alternative may be to just run two clu= sters instead of trying to define the shape of a single one through topolog= y definitions. I think what you're talking about here is on the edge of wha= t Cassandra is designed to do; it works best when all nodes are uniform and= have the same understanding about the cluster. Richard From: Bill Au [mailto:bill.w.au@gmail.com] Sent: 19 April 2012 19:58 To: user@cassandra.apache.org Subject: Re: default required in cassandra-topology.properties? I had thought that the topology file is used for replicas placement only su= ch that for the token range that the unknown node is responsible for, data = is still read and write there. It just won't be replicated since replicati= on factor is not defined. Bill On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe > wrote: Yes it is possible. Put the following as the last line of your topology fil= e: default=3Dunknown:unknown So long as you don't have any DC or rack with this name your local node wil= l not be able to address any nodes that aren't explicitly given in its topo= logy file. However bear in mind that, whilst Cassandra won't try to use replication fa= ctor to store to these 'unknown' nodes, their token may mean that the 'natu= ral' home for a row is on a node that is not addressable. This can create h= oles in your dataset and create situations where data can 'disappear' becau= se the bloom filter says the data is on a particular node (due to its token= ) but the coordinator can't contact that node to get at the data. Careful use of replication factor and NetworkTopologyStrategy can help with= this, but you should make sure that a node really doesn't need to contact = the unknown nodes before marking them as such. Richard From: Bill Au [mailto:bill.w.au@gmail.com] Sent: 19 April 2012 17:16 To: user@cassandra.apache.org Subject: default required in cassandra-topology.properties? All the examples of cassandra-topology.properties that I have seen have a d= efault entry assigning unknown nodes to a specific data center and rack. I= s it possible to have Cassandra ignore unknown nodes for the purpose of rep= lication? Bill --_000_9FC471D82FE4194FA2B6A1403B475C9B08EE6BC5DBXPRD0310MB384_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

As far as I know it’= ;s not possible to leave replication factor undefined – if you do the= n Cassandra will default to RF=3D1 with SimpleStrategy.

 <= /p>

The topology is local to = each node, so unless all your nodes have the same topology file then itR= 17;s possible for them each to have a different idea about the topology of the cluster.

 <= /p>

I’m not sure what y= ou’re trying to achieve here, so I’ll give an example.

 <= /p>

Say you have two datacent= ers, DC1 and DC2. It’s perfectly possible for nodes in DC1 to have a = topology file that only mentions DC1 nodes and nodes in DC2 to have a topology file that only mentions DC2 nodes. You can then define one= keyspace with strategy options DC1: 3 and another with DC2: 3 and this sho= uld work fine.

 <= /p>

However if you had a keys= pace with strategy options DC1: 3, DC2: 3 then you would AFAIK never be abl= e to write to that column family because none of the nodes know enough about the topology; they can either address DC1, or address DC= 2, but not both.

 <= /p>

If there were a third typ= e of node that had topology defined for both DC1 and DC2 then these nodes w= ould then be able to update the DC1+DC2 keyspace, even though DC1-only and DC2-only nodes would not.

 <= /p>

So if there is a clear se= gregation in your data then splitting the topology may be OK, but if not th= en you will likely find that you can’t update the keyspace unless a node has sufficient knowledge of the topology. =

 <= /p>

Depending on your use cas= e a simpler alternative may be to just run two clusters instead of trying t= o define the shape of a single one through topology definitions. I think what you’re talking about here is on the edge of what Cassan= dra is designed to do; it works best when all nodes are uniform and have th= e same understanding about the cluster.

 <= /p>

Richard=

 <= /p>

 <= /p>

From: Bill Au [mailto:bill.w.au@gmail.com]
Sent: 19 April 2012 19:58
To: user@cassandra.apache.org
Subject: Re: default required in cassandra-topology.properties?=

 

I had thought that th= e topology file is used for replicas placement only such that for the token= range that the unknown node is responsible for, data is still read and wri= te there.  It just won't be replicated since replication factor is not defined.

Bill

On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe <richard.lowe@arkivum.com> = wrote:

Yes it is possible. Put the following a= s the last line of your topology file:

 

default=3Dunknown:unknown

 

So long as you don’t have any DC = or rack with this name your local node will not be able to address any nodes that aren’t explicitly given in its topology file. =

 

However bear in mind that, whilst Cassa= ndra won’t try to use replication factor to store to these ‘unknown’ nodes, their token may mean that the ‘natural&= #8217; home for a row is on a node that is not addressable. This can create= holes in your dataset and create situations where data can ‘disappea= r’ because the bloom filter says the data is on a particular node (due to its token) but the coordinator can’t contact that node to ge= t at the data.

 

Careful use of replication factor and N= etworkTopologyStrategy can help with this, but you should make sure that a node really doesn’t need to contact the unknown nod= es before marking them as such.

 

 

Richard

 

 

From: Bill Au [mailto:bill.w= .au@gmail.com]
Sent: 19 April 2012 17:16
To: u= ser@cassandra.apache.org
Subject: default required in cassandra-topology.properties?

 

All the examples of cassandra-topology.properties that I have seen= have a default entry assigning unknown nodes to a specific data center and= rack.  Is it possible to have Cassandra ignore unknown nodes for the purpose of replication?

Bill

 

--_000_9FC471D82FE4194FA2B6A1403B475C9B08EE6BC5DBXPRD0310MB384_--