incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Шамим <>
Subject Re: ETL Tools to transfer data from Cassandra into other relational databases
Date Fri, 14 Dec 2012 08:58:54 GMT
Hello Chin,
 you can extract delta using pig script and save it in another CF in Cassandra. By using Pentaho
kettle you can then load the data from the CF to RDBMS. Pentaho Kettle is open source project.
All of the process you can automate through Azkaban or Ozzie.
Kafka is also an alternatives as metioned above.

14.12.2012, 07:20, "" <>:
> We will use Cassandra as logging storage in one of our web application. The application
only insert rows into Cassandra but never update or delete any rows. The CF is expected to
grow by about 0.5 million rows per day.
> We need to transfer the data in Cassandra to another relational database daily. Due to
the large size of the CF, instead of truncating the relational table and reloading all rows
into it each time, we plan to run a job to select the "delta" rows since the last run and
insert them into the relational database.
> We know we can use Java, Pig or Hive to extract the delta rows to a flat file and load
the data into the target relational table. We are particularly interested in a process that
can extract delta rows without scanning the entire CF.
> Has anyone used any other ETL tools to do this kind of delta extraction from Cassandra?
We appreciate any comments and experience.
> Thanks,
> Chin

View raw message