The Hadoop integration (as demonstrated by contrib/word_count) is locality aware: it begins
by querying Cassandra to generate locality aware splits, and when the hostnames match up between
the Hadoop and Cassandra clusters, the data can be mapped locally.
-----Original Message-----
From: "Maxim Grinev" <maxim@grinev.net>
Sent: Tuesday, May 18, 2010 2:42am
To: user@cassandra.apache.org
Subject: Re: Hadoop over Cassandra
On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vivek@khera.org> wrote:
> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >> Moving to the user@ list.
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
> >
> > That document doesn't really answer the "is data locality preserved"
> > when running the map phase, but my hunch is "no".
>
> The answer is, "yes, as long as you have hadoop on all the cassandra
> machines." (the case where it's easy to map cassandra locality to
> hadoop locality :)
Jonathan,
could you please clarify this. I also cannot understand how it works. Even
if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
aware of Cassandra's data placement (partitioning and replication)?
Maxim
|