incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian O'Neill" <b...@alumni.brown.edu>
Subject Re: cassandra data to hadoop.
Date Fri, 23 Dec 2011 15:45:35 GMT
I'm not sure this is much help, but we actually run Hadoop jobs to load and
extract data to and from HDFS.  You can use ColumnFamilyInputFormat to race
over the data in Cassandra and output it to a file.  That doesn't solve the
continuous problem, but should give you a batch mechanism to refresh the
data in HDFS.  I presume its even speedier if you are running enterprise,
because the Hadoop process is collocated with Cassandra.

-brian

On Fri, Dec 23, 2011 at 10:12 AM, ravikumar visweswara <
talk2hadoop@gmail.com> wrote:

> Hello All,
>
> I have a situation to dump cassandra data to hadoop cluster for further
> analytics. Lot of other relevant data which is not present in cassandra is
> already available in hdfs for analysis. Both are independent clusters right
> now.
> Is there a suggested way to get the data periodically or continuously to
> HDFS from cassandra? Any ideas or references will be very helpful for me.
>
> Thanks and Regards
> R
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Mime
View raw message