cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kjellman <>
Subject Re: hadoop consistency level
Date Thu, 18 Oct 2012 20:24:43 GMT
Well there is *some* data locality, it's just not guaranteed. My
understanding (and someone correct me if I'm wrong) is that
ColumnFamilyInputFormat implements InputSplit and the getLocations()
duce/InputSplit.html contains logic to do it's best to determine what
node that particular hadoop node contains the data for that mapper.

But obviously this isn't guaranteed though that all data will be on that

Also, for the sake of completeness, we have RF=3 on the Keyspace in

On 10/18/12 1:15 PM, "Andrey Ilinykh" <> wrote:

>On Thu, Oct 18, 2012 at 12:00 PM, Michael Kjellman
><> wrote:
>> Unless you have Brisk (however as far as I know there was one fork that
>> it working on 1.0 but nothing for 1.1 and is not being actively
>> by Datastax) or go with CFS (which comes with DSE) you are not
>> all data is on that hadoop node. You can take a look at the forks if
>> interested here: but I'd
>> be afraid to put my eggs in a basket that is certainly not super
>> anymore.
>> job.getConfiguration().set("", "QUORUM");
>> should get you started.
>This is what I don't understand. With QUORUM you read data from at
>least two nodes. If so, you don't benefit from data locality. What's
>the point to use hadoop? I can run application on any machine(s) and
>iterate through column family. What is the difference?
>Thank you,
>  Andrey

'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks

View raw message