cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: Cassandra & MapReduce/Storm/ etc
Date Fri, 16 May 2014 16:23:03 GMT
Here’s a meetup talk on analytics using Cassandra, Storm, and Kafka:

-- Jack Krupansky

From: Manoj Khangaonkar 
Sent: Thursday, May 8, 2014 5:43 PM
Subject: Cassandra & MapReduce/Storm/ etc


Searching for Cassandra with MapReduce, I am finding that the search results are really dated
-- from version 0.7 & 2010/2011.

Is there a good blog/article that describes how using MapReduce on Cassandra table ?

>From my naive understanding, Cassandra is all about partitioning. Querying is based on
partitionkey + clustered column(s).

Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of tuples.

If a database table is input source for MapReduce or Storm, for me , this is in the simple
case, is translating to a full table scan of the input table, which can timeout and is generally
not a recommended access pattern in Cassandra. 

My initial reaction is that if I need to process data with MapReduce or Storm, reading it
from Cassandra might not be the optimal way. Storing the output to Cassandra however does
make sense.

If anyone had links to blogs or personal experience in this area, I would appreciate if you
can share it.


View raw message