cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leena Ghatpande <lghatpa...@hotmail.com>
Subject Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.
Date Mon, 11 Mar 2019 15:55:15 GMT
We have a table with over 70M rows with a partition key that is unique.  We have a  created
datetime stamp on each record, and we have a need to select all rows created for a date range.
Secondary index is not an option as its high cardinality and could slow performance doing
a full scan on 70M rows.


We are thinking of creating a new table with a date field as a clustering column to be able
to query for date ranges, but partition key to clustering key will be 1-1. Is this a good
approach?

To do this, we need to copy this large volume of data from table1 to table2 within the same
cluster, while updates are still happening to table1. We need to do this real time without
impacting our customers. COPY is not an option, as we have ttl's on each row on table1 that
need to be applied to table2 as well.


So what would be the best approach

  1.  To be able select data using date range without impacting performance. This operation
will be needed only on adhoc basis and it wont be as frequent .
  2.  Best way to migrate large volume of data with ttl from one table to another within the
same cluster.


Any other suggestions also will be greatly appreciated.


Mime
View raw message