I'm not sure this is much help, but we actually run Hadoop jobs to load and extract data to and from HDFS.  You can use ColumnFamilyInputFormat to race over the data in Cassandra and output it to a file.  That doesn't solve the continuous problem, but should give you a batch mechanism to refresh the data in HDFS.  I presume its even speedier if you are running enterprise, because the Hadoop process is collocated with Cassandra.


On Fri, Dec 23, 2011 at 10:12 AM, ravikumar visweswara <talk2hadoop@gmail.com> wrote:
Hello All,

I have a situation to dump cassandra data to hadoop cluster for further analytics. Lot of other relevant data which is not present in cassandra is already available in hdfs for analysis. Both are independent clusters right now.
Is there a suggested way to get the data periodically or continuously to HDFS from cassandra? Any ideas or references will be very helpful for me.

Thanks and Regards

Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/