cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: hadoop consistency level
Date Thu, 18 Oct 2012 21:31:15 GMT

On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh <> wrote:

> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
> <> wrote:
>> Not sure I understand your question (if there is one..)
>> You are more than welcome to do CL ONE and assuming you have hadoop nodes
>> in the right places on your ring things could work out very nicely. If you
>> need to guarantee that you have all the data in your job then you'll need
>> to use QUORUM.
>> If you don't specify a CL in your job config it will default to ONE (at
>> least that's what my read of the ConfigHelper source for 1.1.6 shows)
> I have two questions.
> 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
> it correct?

Yes and at QUORUM it's quasi local.  The job tracker finds out where a range is and sends
a task to a replica with the data (local).  In the case of CL.QUORUM (see the Read Path section
of, it will do an actual read of the
data on the node closest (local).  Then it will get a digest from other nodes to verify that
they have the same data.  So in the case of RF=3 and QUORUM, it will read the data on the
local node where the task is running and will check the next closest replica for a digest
to verify that it is consistent.  Information is sent across the wire and there is the latency
of that, but it's not the data that's sent.

> 2. With CL QUORUM cassandra reads data from all replicas. In this case
> Hadoop doesn't give me any  benefits. Application running outside the
> cluster has the same performance. Is it correct?

CL QUORUM does not read data from all replicas.  Applications running outside the cluster
have to copy the data from the cluster, a much more copy/network intensive operation than
using CL.QUORUM with the built-in Hadoop support.

> Thank you,
>  Andrey

View raw message