incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: Data not fully replicated with 2 nodes and replication factor 2
Date Tue, 18 Jun 2013 18:36:09 GMT
Cassandra doesn't do async replication like HBase does.You can run nodetool repair to insure
the consistency. 

Or you can increase your Read or Write consistency. As long as R + W > RF, you have strong
consistency. In your case, you can use CL.TWO for either read and write. 

-Wei 

----- Original Message -----

From: "James Lee" <James.Lee@metaswitch.com> 
To: user@cassandra.apache.org 
Sent: Tuesday, June 18, 2013 5:02:53 AM 
Subject: Data not fully replicated with 2 nodes and replication factor 2 



Hello, 

I’m seeing a strange problem with a 2-node Cassandra test deployment, where it seems that
data isn’t being replicated among the nodes as I would expect. I suspect this may be a configuration
issue of some kind, but have been unable to figure what I should change. 

The setup is as follows: 
· Two Cassandra nodes in the cluster (they each have themselves and the other node as seeds
in cassandra.yaml). 
· Create 40 keyspaces, each with simple replication strategy and replication factor 2. 
· Populate 125,000 rows into each keyspace, using a pycassa client with a connection pool
pointed at both nodes (I’ve verified that pycassa does indeed send roughly half the writes
to each node). These are populated with writes using consistency level of 1. 
· Wait 30 minutes (to give replications a chance to complete). 
· Do random reads of the rows in the keyspaces, again using a pycassa client with a connection
pool pointed at both nodes. These are read using consistency level 1. 

I’m finding that the vast majority of reads are successful, but a small proportion (~0.1%)
are returned as Not Found. If I manually try to look up those keys using cassandra-cli, I
see that they are returned when querying one of the nodes, but not when querying the other.
So it seems like some of the rows have simply not been replicated. 

I’m not sure how I can monitor the status of ongoing replications, but the system has been
idle for many 10s of minutes and the total database size is only about 5GB, so I don’t think
there are any further ongoing operations. 

Any suggestions? In case it’s relevant, my setup is: 
· Cassandra 1.2.2, running on Linux 
· Sun Java 1.7.0_10-b18 64-bit 
· Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M 

Thank you, 
James Lee 


Mime
View raw message