incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Browning <ben...@gmail.com>
Subject Re: Hadoop over Cassandra
Date Tue, 18 May 2010 11:16:58 GMT
Maxim,

Check out the getLocation() method from this file:

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java

Basically, it loops over the list of nodes containing this split of
data and if any of them are the local node, it returns that. Otherwise
it returns the first node that contains the data.

The code that creates the splits of data and figures out which node
each split is located on is here:

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java


Ben

On Tue, May 18, 2010 at 3:42 AM, Maxim Grinev <maxim@grinev.net> wrote:
>
> On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vivek@khera.org> wrote:
>> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbellis@gmail.com>
>> > wrote:
>> >> Moving to the user@ list.
>> >>
>> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
>> >
>> > That document doesn't really answer the "is data locality preserved"
>> > when running the map phase, but my hunch is "no".
>>
>> The answer is, "yes, as long as you have hadoop on all the cassandra
>> machines." (the case where it's easy to map cassandra locality to
>> hadoop locality :)
>
> Jonathan,
> could you please clarify this. I also cannot understand how it works. Even
> if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
> aware of Cassandra's data placement (partitioning and replication)?
> Maxim
>
>

Mime
View raw message