cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: 4/20 nodes get disproportionate amount of mutations
Date Tue, 23 Aug 2011 08:43:17 GMT
Dropped messages in ReadRepair is odd. Are you also dropping mutations ? 

There are two tasks performed on the ReadRepair stage. The digests are compared on this stage,
and secondly the repair happens on the stage. Comparing digests is quick. Doing the repair
could take a bit longer, all the cf's returned are collated, filtered and deletes removed.

We don't do background Read Repair on range scans, they do have foreground digest checking

What CL are you using ? 

begin crazy theory:

	Could there be a very big row that is out of sync ? The increased RR would be resulting in
mutations been sent back to the replicas. Which would give you a hot spot in mutations.
	Check max compacted row size on the hot nodes. 
	Turn the logging up to DEBUG on the hot machines for o.a.c.service.RowRepairResolver and
look for the "resolve:…" message it has the time taken.


Aaron Morton
Freelance Cassandra Developer

On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote:

> On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote:
>>> We've been having issues where as soon as we start doing heavy writes (via hadoop)
recently, it really hammers 4 nodes out of 20.  We're using random partitioner and we've set
the initial tokens for our 20 nodes according to the general spacing formula, except for a
few token offsets as we've replaced dead nodes.
>> Is the hadoop job iterating over keys in the cluster in token order
>> perhaps, and you're generating writes to those keys? That would
>> explain a "moving hotspot" along the cluster.
> Yes - we're iterating over all the keys of particular column families, doing joins using
pig as we enrich and perform measure calculations.  When we write, we're usually writing out
for a certain small subset of keys which shouldn't have hotspots with RandomPartitioner afaict.
>> -- 
>> / Peter Schuller (@scode on twitter)

View raw message