cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <>
Subject Re: ETL options from Hive/Presto/s3 to cassandra
Date Tue, 07 Aug 2018 13:36:56 GMT
Spark is scalable to as many nodes as you want and could be collocated with the data nodes
— sstableloader wont be as performant for larger datasets. Although it can be run in parallel
on different nodes I don’t believe it to be as fault tolerant.

If you have to do it continuously I would even think about leveraging Kafka as the transport
layer and using Kafka Connect. It brings other tooling to get data into Cassandra from a variety
of sources.

On Aug 6, 2018, 3:16 PM -0400, srimugunthan dhandapani <>,
> Hi all,
> We have data that gets filled into Hive/ presto  every few hours.
> We want that data to be transferred to cassandra tables.
> What are some of the high performance ETL options for transferring data between hive 
or presto into cassandra?
> Also does anybody have any performance numbers comparing
> - loading data from S3 to cassandra using SStableloader
> - and loading data from S3 to cassandra using other means (like spark-api)?
> Thanks,
> mugunthan

View raw message