incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject Re: Hadoop over Cassandra
Date Tue, 18 May 2010 17:51:03 GMT
The Hadoop integration (as demonstrated by contrib/word_count) is locality aware: it begins
by querying Cassandra to generate locality aware splits, and when the hostnames match up between
the Hadoop and Cassandra clusters, the data can be mapped locally.

-----Original Message-----
From: "Maxim Grinev" <maxim@grinev.net>
Sent: Tuesday, May 18, 2010 2:42am
To: user@cassandra.apache.org
Subject: Re: Hadoop over Cassandra

On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vivek@khera.org> wrote:
> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >> Moving to the user@ list.
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
> >
> > That document doesn't really answer the "is data locality preserved"
> > when running the map phase, but my hunch is "no".
>
> The answer is, "yes, as long as you have hadoop on all the cassandra
> machines." (the case where it's easy to map cassandra locality to
> hadoop locality :)


Jonathan,

could you please clarify this. I also cannot understand how it works. Even
if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
aware of Cassandra's data placement (partitioning and replication)?

Maxim



Mime
View raw message