cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Griffith <>
Subject Cassandra Pig with network topology and data centers.
Date Fri, 29 Jul 2011 22:31:11 GMT
I currently have a 9 node cassandra cluster setup as follows:

DC1: Six nodes
DC2: Three nodes

The tokens alternate between the two datacenters.

I have hadoop installed as tasktracker/datanodes on the 
three cassandra nodes in DC2.

There is another non cassandra node that is used as the hadoop namenode / job 

When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
consistency I am seeing network and cpu spikes on the nodes in DC1.  I was 
not expecting any impact on those nodes when local quorum is used.

Can read repair be causing the traffic/cpu spikes?  

The replication settings for DC1 is 5, and for DC2 is 1.

When looking at the map tasks I am seeing input splits for computers in 
both data centers.  I am not sure what this means.  My thought is 
that is should only be getting data from the nodes in DC2.



View raw message