Since this thread already contains the system setup, I just want to ask another question:

If you have 3 data centers (DC1,DC2 and DC3) and you have a keyspace where the strategy options are such that each DC gets one replica. If you only write to the nodes in one DC1 what is the path the replicas take assuming you're correctly interleaved and evenly spaced the tokens of all the nodes? If you write a record in a node in DC1 will it replicate it to the node in DC2 and the node in DC2 will replicate it to the node in DC3? Or will the node in DC1 replicate the record both to DC2 and DC3?


On Thu, Mar 15, 2012 at 11:26 PM, Alexandru Sicoe <adsicoe@gmail.com> wrote:
Sorry for that last message, I was confused because I thought I needed to use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and that allows me to get the configuration with 3 data centers explained.


On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe <adsicoe@gmail.com> wrote:
Thanks Tyler,
 I see that cassandra.yaml has "endpoint_snitch: com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the configuration from the cassandra-topology.properties file as does the PropertyFileSnitch ? Or is there some other way of telling it which nodes are in withc DC?


On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs <tyler@datastax.com> wrote:
Yes, you can do this.

You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6], and DC3 with [7, 8, 9].  For your normal data keyspace, the replication strategy should be NTS, and the strategy_options should have some replicas in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you need that level of replication in each one (although you probably only want an RF of 1 for DC3).

Your clients that are performing writes should only open connections against the nodes in DC1, and you should write at CL.ONE or CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.

The nodes in DC3 should run as analytics nodes.  I believe the default CL for m/r jobs is ONE, which would work.

As far as tokens go, interleaving all three DCs and evenly spacing the tokens will work.  For example, the ordering of your nodes might be [1, 4, 7, 2, 5, 8, 3, 6, 9].

On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe <adsicoe@gmail.com> wrote:
Hi everyone,
 I want to test out the Datastax Enterprise software to have a mixed workload setup with an analytics and a real time part.

 However I am not sure how to configure it to achieve what I want: I will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on another(4,5,6).
 1,2,3 will each have a normal Cassandra node that just takes data directly from my data sources. I want them to replicate the data to the other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3 and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.  Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest as analytics nodes? If I alternate the tokens as it's explained in http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dse is it analoguous to achieving something like 3 DCs each getting their own replica?


Tyler Hobbs