incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Parikh <>
Subject Re: ETL Tools to transfer data from Cassandra into other relational databases
Date Fri, 14 Dec 2012 04:39:54 GMT
Why would you use Cassandra for primary store of logging information? Have
you considered Kafka ?

You could , of course, then fan out the logs to both Cassandra (on a near
real time basis ) and then on a daily basis (if you wish) extract the
"deltas" from Kafka into a RDBMS; with no PIG/Hive etc.



On Thu, Dec 13, 2012 at 7:19 PM, <>wrote:

> We will use Cassandra as logging storage in one of our web application.
> The application only insert rows into Cassandra but never update or delete
> any rows. The CF is expected to grow by about 0.5 million rows per day.
> We need to transfer the data in Cassandra to another relational database
> daily. Due to the large size of the CF, instead of truncating the
> relational table and reloading all rows into it each time, we plan to run a
> job to select the "delta" rows since the last run and insert them into the
> relational database.
> We know we can use Java, Pig or Hive to extract the delta rows to a flat
> file and load the data into the target relational table. We are
> particularly interested in a process that can extract delta rows without
> scanning the entire CF.
> Has anyone used any other ETL tools to do this kind of delta extraction
> from Cassandra? We appreciate any comments and experience.
> Thanks,
> Chin

View raw message