incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravikumar visweswara <talk2had...@gmail.com>
Subject Re: cassandra data to hadoop.
Date Sat, 24 Dec 2011 07:20:30 GMT
Jeremy,

We use cloudera distribution for our hadoop cluster and may not be possible
to migrate to brisk quickly because of flume/hue dependencies. Did you
successfully pull the data from independent cassandra cluster and dump into
completely disconnected hadoop cluster? It will be really helpful if you
elaborate on how to achieve this.

-R

On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna <jeremy.hanna1234@gmail.com>wrote:

> We do this all the time.  Take a look at
> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can
> use mapreduce or pig to get data out of cassandra.  If it's going to a
> separate hadoop cluster, I don't think you'd need to co-locate task
> trackers or data nodes on your cassandra nodes - it would just need to copy
> over the network though.  We also use oozie for job scheduling, fwiw.
>
> On Dec 23, 2011, at 9:12 AM, ravikumar visweswara wrote:
>
> > Hello All,
> >
> > I have a situation to dump cassandra data to hadoop cluster for further
> analytics. Lot of other relevant data which is not present in cassandra is
> already available in hdfs for analysis. Both are independent clusters right
> now.
> > Is there a suggested way to get the data periodically or continuously to
> HDFS from cassandra? Any ideas or references will be very helpful for me.
> >
> > Thanks and Regards
> > R
>
>

Mime
View raw message