incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Cassandra & MapReduce/Storm/ etc
Date Mon, 12 May 2014 08:59:04 GMT
> Is there a good blog/article that describes how using MapReduce on Cassandra table ?
The best way to get into cassandra and hadoop is to play with Cassandra DSE. 

It’s free for development, costs for production, and is an easy way to learn about hadoop
integration without having to worry about the installation process.

http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hadoop

> If a database table is input source for MapReduce or Storm, for me , this is in the simple
case, is translating to a full table scan of the input table, which can timeout and is generally
not a recommended access pattern in Cassandra. 
The Hadoop integration is token aware, it splits the tasks to run local on the node. The tasks
then scan over the token range local to the node. 

Hope that helps. 
A

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 9:43 am, Manoj Khangaonkar <khangaonkar@gmail.com> wrote:

> Hi,
> 
> Searching for Cassandra with MapReduce, I am finding that the search results are really
dated -- from version 0.7 & 2010/2011.
> 
> Is there a good blog/article that describes how using MapReduce on Cassandra table ?
> 
> From my naive understanding, Cassandra is all about partitioning. Querying is based on
partitionkey + clustered column(s).
> 
> Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of tuples.
> 
> If a database table is input source for MapReduce or Storm, for me , this is in the simple
case, is translating to a full table scan of the input table, which can timeout and is generally
not a recommended access pattern in Cassandra. 
> 
> My initial reaction is that if I need to process data with MapReduce or Storm, reading
it from Cassandra might not be the optimal way. Storing the output to Cassandra however does
make sense.
> 
> If anyone had links to blogs or personal experience in this area, I would appreciate
if you can share it.
> 
> regards
> 
> 
> 


Mime
View raw message