cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: cassandra data to hadoop.
Date Fri, 23 Dec 2011 17:46:23 GMT
We currently have cassandra nodes co-located with hadoop nodes and do a lot of data analytics
with it.  We've looked at brisk - brisk still open-source and available but datastax is putting
its resources in a closed version of brisk as part of datastax enterprise.  We'll likely be
moving to that over the next month.  That said, there's nothing wrong with just using vanilla
hadoop with cassandra either - it's worked for us for almost a year now.  Brisk/Datastax enterprise
is simpler though, especially for doing a lot with the two together.

On Dec 23, 2011, at 11:33 AM, Praveen Sadhu wrote:

> Have you tried Brisk?
> On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna" <> wrote:
>> We do this all the time.  Take a look at
for some details - you can use mapreduce or pig to get data out of cassandra.  If it's going
to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data
nodes on your cassandra nodes - it would just need to copy over the network though.  We also
use oozie for job scheduling, fwiw.
>> On Dec 23, 2011, at 9:12 AM, ravikumar visweswara wrote:
>>> Hello All,
>>> I have a situation to dump cassandra data to hadoop cluster for further analytics.
Lot of other relevant data which is not present in cassandra is already available in hdfs
for analysis. Both are independent clusters right now.
>>> Is there a suggested way to get the data periodically or continuously to HDFS
from cassandra? Any ideas or references will be very helpful for me.
>>> Thanks and Regards
>>> R

View raw message