Hey Cassandra folks,

I'm trying to change the schema of an existing table by creating a new one and migrating the data.

The initial table schema looks like this:
CREATE TABLE IF NOT EXISTS initial_table (
    user_id                 text,
    message_id              timeuuid,
    interaction_state       text,
    interaction_timestamp   timestamp,
    PRIMARY KEY ((user_id), message_id, interaction_state, interaction_timestamp)
);


We're trying to remove interaction timestamp from the PK - same schema but with PRIMARY KEY ((user_id), message_id, interaction_state)

When importing the .csv dump obtained from the initial_table, the timestamp column seems to be written in a weird way. Multiple rows from the old schema need to be merged to a single entry of the new schema. For most cases, it seems the last entry entry gets copied over to the new table while for others a random one gets copied. Check out the below csv sample and the copy from result.

initial_table_dump.csv
123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-03 17:50:59+0000
123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-10 00:05:41+0000


copy new_table(user_id, message_id, interaction_state, interaction_timestamp) from '~/initial_table_dump.csv';

Result:
user_id | message_id                           | interaction_state | interaction_timestamp
---------+--------------------------------------+-------------------+--------------------------
 123 | ed6c69a0-0add-11b2-8080-808080808080 |         DISMISSED | 2020-01-03 17:50:59+0000

Notice the first row from the csv gets written into the new table in this case - here there are only two rows, but for multiple ones it seems a random one would be copied over, not the first/last one necessarily. When updating the interaction_timestamp column value as below, it seems to copy the latest entry to the new table. 

initial_table_dump_2.csv
123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-03 17:50:59+0000
123,ed6c69a0-0add-11b2-8080-808080808080,DISMISSED,2020-01-05 00:05:41+0000

- perform same copy from operation - 

Result:
user_id | message_id                           | interaction_state | interaction_timestamp
---------+--------------------------------------+-------------------+--------------------------
 123 | ed6c69a0-0add-11b2-8080-808080808080 |         DISMISSED | 
2020-01-05 00:05:41+0000

Could someone help me understand why this might happen? Does the 'copy from' follow the order from the csv when doing the import or there are no order guarantees?

I'm using the below cqlsh and Cassandra versions:
[cqlsh 5.0.1 | Cassandra 2.2.15 | CQL spec 3.3.1 | Native protocol v4]

Thanks,
Bogdan