cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Schmitz <christo...@instaclustr.com>
Subject Re: Cassandra loading data from another table
Date Tue, 02 Oct 2018 01:33:53 GMT
Have a look at using Spark on Cassandra. It's commonly used for data
movement / data migration / reconciliation (on top of analytics). You will
get much better performances.

Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra
| Kafka | Spark Consulting





On Tue, 2 Oct 2018 at 09:58 Richard Xin <richardxin168@yahoo.com.invalid>
wrote:

> Christophe, thanks for your insights,
> Sorry, I forgot to mention that currently both tableA and tableB are being
> updated by application (all newly inserted/updated records should be
> identical on A and B), exporting from tableB and COPY it back later on will
> result in older data overwrites newly updated data.
>
> I can only thinking about using COPY tableA to a csv, and then iterate the
> csv line by line to insert to tableB using "if not exists" clause to avoid
> down-time , but it's error-prone and slow. Not sure whether there is a
> better way.
> Best,
> Richard
>
> On Monday, October 1, 2018, 4:34:38 PM PDT, Christophe Schmitz <
> christophe@instaclustr.com> wrote:
>
>
> Hi Richard,
>
> You could consider exporting your few thousands record of Table B in a
> file, with *COPY TO*. Then *TRUNCATE* Table B, copy the SSTable files of
> TableA to the data directory of Table A (make sure you *flush* the
> memtables first), then run nodetool *refresh*. Final step is to load the
> few thousands record on Table B with *COPY FROM*. This will overwrite the
> data you loaded from the SSTables of Table A.
> Overall, there is no downtime on your cluster, there is no downtime on
> Table A, yet you need to think about the consequences on Table B if your
> application is writing on Table A or Table B during this process.
> Please test first :)
>
> Cheers,
> Christophe
>
> Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> -
> Cassandra | Kafka | Spark Consulting
>
>
>
>
> On Tue, 2 Oct 2018 at 09:18 Richard Xin <richardxin168@yahoo.com.invalid>
> wrote:
>
> I have a tableA with about a few ten millions record, and I have tableB
> with a few thousands record,
> TableA and TableB have exact same schema (except that tableB doesnt have
> TTL)
>
> I want to load all data to tableB from tableA EXCEPT for those already on
> tableB (we don't want data on tableB to be overwritten)
>
> What's the best to way accomplish this?
>
> Thanks,
>
>

Mime
View raw message