cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: hadoop consistency level
Date Thu, 18 Oct 2012 21:31:15 GMT

On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:

> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
> <mkjellman@barracuda.com> wrote:
>> Not sure I understand your question (if there is one..)
>> 
>> You are more than welcome to do CL ONE and assuming you have hadoop nodes
>> in the right places on your ring things could work out very nicely. If you
>> need to guarantee that you have all the data in your job then you'll need
>> to use QUORUM.
>> 
>> If you don't specify a CL in your job config it will default to ONE (at
>> least that's what my read of the ConfigHelper source for 1.1.6 shows)
>> 
> I have two questions.
> 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
> it correct?

Yes and at QUORUM it's quasi local.  The job tracker finds out where a range is and sends
a task to a replica with the data (local).  In the case of CL.QUORUM (see the Read Path section
of http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an actual read of the
data on the node closest (local).  Then it will get a digest from other nodes to verify that
they have the same data.  So in the case of RF=3 and QUORUM, it will read the data on the
local node where the task is running and will check the next closest replica for a digest
to verify that it is consistent.  Information is sent across the wire and there is the latency
of that, but it's not the data that's sent.

> 2. With CL QUORUM cassandra reads data from all replicas. In this case
> Hadoop doesn't give me any  benefits. Application running outside the
> cluster has the same performance. Is it correct?

CL QUORUM does not read data from all replicas.  Applications running outside the cluster
have to copy the data from the cluster, a much more copy/network intensive operation than
using CL.QUORUM with the built-in Hadoop support.

> 
> Thank you,
>  Andrey


Mime
View raw message