incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: Cassandra Pig with network topology and data centers.
Date Sat, 30 Jul 2011 00:08:57 GMT
It'd be great if we had different settings for inter- and intra-DC read repair.

-ryan

On Fri, Jul 29, 2011 at 5:06 PM, Jake Luciani <jakers@gmail.com> wrote:
> Yes it's read repair you can lower the read repair chance to tune this.
>
>
>
> On Jul 29, 2011, at 6:31 PM, Aaron Griffith <aaron.c.griffith@gmail.com> wrote:
>
>> I currently have a 9 node cassandra cluster setup as follows:
>>
>> DC1: Six nodes
>> DC2: Three nodes
>>
>> The tokens alternate between the two datacenters.
>>
>> I have hadoop installed as tasktracker/datanodes on the
>> three cassandra nodes in DC2.
>>
>> There is another non cassandra node that is used as the hadoop namenode / job
>> tracker.
>>
>> When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
>> consistency I am seeing network and cpu spikes on the nodes in DC1.  I was
>> not expecting any impact on those nodes when local quorum is used.
>>
>> Can read repair be causing the traffic/cpu spikes?
>>
>> The replication settings for DC1 is 5, and for DC2 is 1.
>>
>> When looking at the map tasks I am seeing input splits for computers in
>> both data centers.  I am not sure what this means.  My thought is
>> that is should only be getting data from the nodes in DC2.
>>
>> Thanks
>>
>> Aaron
>>
>

Mime
View raw message