cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arbab Khalil <akha...@an10.io>
Subject Re: Suggestions for migrating data from cassandra
Date Tue, 15 May 2018 20:31:21 GMT
Both C* and mysql support is available in Spark. For C*,
datastax:spark-cassandra-connector is needed. It is very simple to read and
write data in Spark.
To read C* table use:

df = spark.read.format("org.apache.spark.sql.cassandra")\

    .options(keyspace = 'test', table = 'test_table').load()

and to write data to mysql table use:

df.write.format('jdbc').options(
          url='jdbc:mysql://localhost/database_name',
          driver='com.mysql.jdbc.Driver',
          dbtable='DestinationTableName',
          user='your_user_name',
          password='your_password').mode('append').save()

While submitting the spark <http://spark.apache.org/> program, use the
following command:

bin/spark-submit --packages datastax:spark-cassandra-connector:2.0.7-s_2.11 \

   --jars external/mysql-connector-java-5.1.40-bin.jar \

    /path_to_your_program/spark_database.py

It should solve your problem and save your time,


On Tue, May 15, 2018 at 11:04 PM, kurt greaves <kurt@instaclustr.com> wrote:

> COPY might work but over hundreds of gigabytes you'll probably run into
> issues if you're overloaded. If you've got access to Spark that would be an
> efficient way to pull down an entire table and dump it out using the
> spark-cassandra-connector.
>
> On 15 May 2018 at 10:59, Jing Meng <self.reload@gmail.com> wrote:
>
>> Hi guys, for some historical reason, our cassandra cluster is currently
>> overloaded and operating on that somehow becomes a nightmare. Anyway,
>> (sadly) we're planning to migrate cassandra data back to mysql...
>>
>> So we're not quite clear how to migrating the historical data from
>> cassandra.
>>
>> While as I know there is the COPY command, I wonder if it works in
>> product env where more than hundreds gigabytes data are present. And, if it
>> does, would it impact server performance significantly?
>>
>> Apart from that, I know spark-connector can be used to scan data from c*
>> cluster, but I'm not that familiar with spark and still not sure whether
>> write data to mysql database can be done naturally with spark-connector.
>>
>> Are there any suggestions/best-practice/read-materials doing this?
>>
>> Thanks!
>>
>
>


-- 
Regards,
Arbab Khalil
Software Design Engineer

Mime
View raw message