cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-1124) Improve Cassandra to MapReduce locality sharing
Date Tue, 25 May 2010 01:18:23 GMT


Stu Hood commented on CASSANDRA-1124:

On closer inspection, there doesn't appear to be any way to specify the rack/datacenter from
the InputFormat. Hadoop uses a DNSToSwitchMapping to resolve a hostname's rack location: implementations
don't always use DNS, but they always run on the JobTracker.

The options for optimally running Hadoop and Cassandra together appear to be: run Hadoop JobTrackers
on all of the Cassandra nodes (no need for datanodes) or extend/script a DNSToSwitchMapping
that makes RPC calls to Cassandra nodes for EndPointSnitch information.

> Improve Cassandra to MapReduce locality sharing
> -----------------------------------------------
>                 Key: CASSANDRA-1124
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
> Currently, the hadoop integration only passes the data's local node information (ColumnFamilyRecordReader-RowIterator-getLocation).
 Hadoop can take advantage of full locality and it's possible that we have full locality configured
in Cassandra.
> So this improvement is for adding the full locality of the data into the String in a
way that hadoop can make use of it with its Job/Task Trackers.
> This will allow for jobs to be potentially on the same rack and/or datacenter if possible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message