Hi Richard,

You could consider exporting your few thousands record of Table B in a file, with COPY TO. Then TRUNCATE Table B, copy the SSTable files of TableA to the data directory of Table A (make sure you flush the memtables first), then run nodetool refresh. Final step is to load the few thousands record on Table B with COPY FROM. This will overwrite the data you loaded from the SSTables of Table A.
Overall, there is no downtime on your cluster, there is no downtime on Table A, yet you need to think about the consequences on Table B if your application is writing on Table A or Table B during this process.
Please test first :)

Cheers,
Christophe

Christophe Schmitz - Instaclustr - Cassandra | Kafka | Spark Consulting





On Tue, 2 Oct 2018 at 09:18 Richard Xin <richardxin168@yahoo.com.invalid> wrote:
I have a tableA with about a few ten millions record, and I have tableB with a few thousands record,
TableA and TableB have exact same schema (except that tableB doesnt have TTL)

I want to load all data to tableB from tableA EXCEPT for those already on tableB (we don't want data on tableB to be overwritten)

What's the best to way accomplish this?  

Thanks,