incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Ilinykh <ailin...@gmail.com>
Subject Re: hadoop consistency level
Date Thu, 18 Oct 2012 22:42:57 GMT
On Thu, Oct 18, 2012 at 2:31 PM, Jeremy Hanna
<jeremy.hanna1234@gmail.com> wrote:
>
> On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:
>
>> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
>> <mkjellman@barracuda.com> wrote:
>>> Not sure I understand your question (if there is one..)
>>>
>>> You are more than welcome to do CL ONE and assuming you have hadoop nodes
>>> in the right places on your ring things could work out very nicely. If you
>>> need to guarantee that you have all the data in your job then you'll need
>>> to use QUORUM.
>>>
>>> If you don't specify a CL in your job config it will default to ONE (at
>>> least that's what my read of the ConfigHelper source for 1.1.6 shows)
>>>
>> I have two questions.
>> 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
>> it correct?
>
> Yes and at QUORUM it's quasi local.  The job tracker finds out where a range is and sends
a task to a replica with the data (local).  In the case of CL.QUORUM (see the Read Path section
of http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an actual read of the
data on the node closest (local).  Then it will get a digest from other nodes to verify that
they have the same data.  So in the case of RF=3 and QUORUM, it will read the data on the
local node where the task is running and will check the next closest replica for a digest
to verify that it is consistent.  Information is sent across the wire and there is the latency
of that, but it's not the data that's sent.
>
>> 2. With CL QUORUM cassandra reads data from all replicas. In this case
>> Hadoop doesn't give me any  benefits. Application running outside the
>> cluster has the same performance. Is it correct?
>
> CL QUORUM does not read data from all replicas.  Applications running outside the cluster
have to copy the data from the cluster, a much more copy/network intensive operation than
using CL.QUORUM with the built-in Hadoop support.
>

Thank you very much, guys! I have a much clearer picture now.

Andrey

Mime
View raw message