cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: inconsistent hadoop/cassandra results
Date Tue, 08 Jan 2013 20:42:22 GMT
Assuming their were no further writes, running repair or using CL all should have fixed it.

Can you describe the inconsistency between runs? 


Aaron Morton
Freelance Cassandra Developer
New Zealand


On 8/01/2013, at 2:16 AM, Brian Jeltema <> wrote:

> I need some help understanding unexpected behavior I saw in some recent experiments with
Cassandra 1.1.5 and Hadoop 1.0.3:
> I've written a small map/reduce job that simply counts the number of columns in each
row of a static CF (call it Foo) 
> and generates a list of every row and column count. A relatively small fraction of the
rows have a large number
> of columns; worst case is approximately 36 million. So when I set up the job, I used
wide-row support:
>     ConfigHelper.setInputColumnFamily(job.getConfiguration(), "fooKS", "Foo", WIDE_ROWS);
// where WIDE_ROWS == true
> When I ran this job using the default CL (1) I noticed that the results varied from run
to run, which I attributed to inconsistent
> replicas, since Foo was generated with CL == 1 and the RF == 3. 
> So I ran repair for that CF on every node. The cassandra log on every node contains lines
similar to:
>   INFO [AntiEntropyStage:1] 2013-01-05 20:38:48,605 (line 778)
[repair #e4a1d7f0-579d-11e2-0000-d64e0a75e6df] Foo is fully synced
> However, repeated runs were still inconsistent. Then I set CL to ALL, which I presumed
would always result in identical
> output, but repeated runs initially continued to be inconsistent. However, I noticed
that the results seemed to
> be converging, and after several runs (somewhere between 4 and 6) I finally was producing
identical results on every run.
> Then I set CL to QUORUM, and again generated inconsistent results.
> Does this behavior make sense?
> Brian

View raw message