incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: Map-Reduce on top of cassandra
Date Mon, 14 Mar 2011 18:41:51 GMT
Just for the sake of updating this thread - Orr didn't yet have task trackers on the Cassandra
nodes so most of the time was likely due to copying the ~100G of data to the hadoop cluster
prior to processing.  They're going to try after installing task trackers on the nodes.

On Mar 14, 2011, at 10:06 AM, Or Yanay wrote:

> Hi All,
> I am trying to write some map-reduce tasks so I can find out stuff like – how many
records have X status?
> I am using 0.7.0 and have 5 nodes with ~100G of data on each node.
> I have written the code based on the word_count example and the map-reduce is running
successfully BUT is extremely slow (about 2 hours for the simplest key count).
> I am now looking to track down the slowness and tune my process, or explore alternative
ways to achieve the same goal.
> Can anyone point me to a way to tune my map-reduce job?
> Does anyone have any experience exploring Cassandra data with Hadoop cluster configuration?
( As suggested in
> Thanks,
> Orr

View raw message