incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: Data not fully replicated with 2 nodes and replication factor 2
Date Wed, 19 Jun 2013 18:43:02 GMT
You have a lot of Dropped Mutations which means those writes might not go through. Since you
have CL.ONE as write consistency, your client doesn't see the exception if write fails only
on one node. 
I think hints are only stored when the other node is down, not on the dropped mutations. (Correct
me if I am wrong, actually it's not a bad idea to store hints for dropped mutations and replay
them later?) 

To solve your issue, as I mentioned, either do nodetool repair, or increase your consistency
level. By the way, you probably write faster than your cluster can handle if you see that
many dropped mutations. 

-Wei 

----- Original Message -----

From: "James Lee" <James.Lee@metaswitch.com> 
To: user@cassandra.apache.org 
Sent: Wednesday, June 19, 2013 2:22:39 AM 
Subject: RE: Data not fully replicated with 2 nodes and replication factor 2 

The test tool I am using catches any exceptions on the original writes and resubmits the write
request until it's successful (bailing out after 5 failures). So for each key Cassandra has
reported a successful write. 


Nodetool says the following - I'm guessing the pending hinted handoff is the interesting bit?


comet-mvs01:/dsc-cassandra-1.2.2# ./bin/nodetool tpstats 
Pool Name Active Pending Completed Blocked All time blocked 
ReadStage 0 0 35445 0 0 
RequestResponseStage 0 0 1535171 0 0 
MutationStage 0 0 3038941 0 0 
ReadRepairStage 0 0 2695 0 0 
ReplicateOnWriteStage 0 0 0 0 0 
GossipStage 0 0 2898 0 0 
AntiEntropyStage 0 0 0 0 0 
MigrationStage 0 0 245 0 0 
MemtablePostFlusher 0 0 1260 0 0 
FlushWriter 0 0 633 0 212 
MiscStage 0 0 0 0 0 
commitlog_archiver 0 0 0 0 0 
InternalResponseStage 0 0 0 0 0 
HintedHandoff 1 1 0 0 0 

Message type Dropped 
RANGE_SLICE 0 
READ_REPAIR 0 
BINARY 0 
READ 0 
MUTATION 60427 
_TRACE 0 
REQUEST_RESPONSE 0 


Looking at the hints column family in the system keyspace, I see one row with a large number
of columns. Presumably that along with the nodetool output above suggests there are hinted
handoffs pending? How long should I expect these to remain for? 

Ah, actually now that I re-run the command it seems that nodetool now reports that hint as
completed and there are no hints left in the system keyspace on either node. I'm still seeing
failures to read the data I'm expecting though, as before. Note that I've run this with a
smaller data set (2M rows, 1GB data total) for this latest test. 

Thanks, 
James 


-----Original Message----- 
From: Robert Coli [mailto:rcoli@eventbrite.com] 
Sent: 18 June 2013 19:45 
To: user@cassandra.apache.org 
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 

On Tue, Jun 18, 2013 at 11:36 AM, Wei Zhu <wz1975@yahoo.com> wrote: 
> Cassandra doesn't do async replication like HBase does.You can run 
> nodetool repair to insure the consistency. 

While this answer is true, it is somewhat non-responsive to the OP. 

If the OP didn't see timeout exception, the theoretical worst case is that he should have
hints stored for initially failed to replicate writes. His nodes should not be failing GC
with a total data size of 5gb on an 8gb heap, so those hints should deliver quite quickly.
After 
30 minutes those hints should certainly be delivered. 

@OP : do you see hints being stored? does nodetool tpstats indicate dropped messages? 

=Rob 


Mime
View raw message