incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: cassandra data to hadoop.
Date Sat, 24 Dec 2011 16:42:09 GMT
You could read using Cassandra client and write to HDFS using Hadoop FS Api.

On Fri, Dec 23, 2011 at 11:20 PM, ravikumar visweswara
<talk2hadoop@gmail.com> wrote:
> Jeremy,
>
> We use cloudera distribution for our hadoop cluster and may not be possible
> to migrate to brisk quickly because of flume/hue dependencies. Did you
> successfully pull the data from independent cassandra cluster and dump into
> completely disconnected hadoop cluster? It will be really helpful if you
> elaborate on how to achieve this.
>
> -R
>
>
> On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna <jeremy.hanna1234@gmail.com>
> wrote:
>>
>> We do this all the time.  Take a look at
>> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can
>> use mapreduce or pig to get data out of cassandra.  If it's going to a
>> separate hadoop cluster, I don't think you'd need to co-locate task trackers
>> or data nodes on your cassandra nodes - it would just need to copy over the
>> network though.  We also use oozie for job scheduling, fwiw.
>>
>> On Dec 23, 2011, at 9:12 AM, ravikumar visweswara wrote:
>>
>> > Hello All,
>> >
>> > I have a situation to dump cassandra data to hadoop cluster for further
>> > analytics. Lot of other relevant data which is not present in cassandra is
>> > already available in hdfs for analysis. Both are independent clusters right
>> > now.
>> > Is there a suggested way to get the data periodically or continuously to
>> > HDFS from cassandra? Any ideas or references will be very helpful for me.
>> >
>> > Thanks and Regards
>> > R
>>
>

Mime
View raw message