cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZAIDI, ASAD" <az1...@att.com>
Subject RE: Duplicates columns which are backed by LIST collection types
Date Thu, 24 Oct 2019 15:36:17 GMT
Have you chosen correct datatype to begin with, if you don’t want duplicates?

Generally speaking:

A set and a list both represent multiple values but do so differently.
A set doesn’t save ordering and values are sorted in ascending order. No duplicates are
allowed.

A list saves ordering where you append or prepend the value into the list. A list allows duplicates.



From: Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
Sent: Thursday, October 24, 2019 10:27 AM
To: user@cassandra.apache.org
Cc: Muralikrishna Gutha <muralikgutha@gmail.com>
Subject: Duplicates columns which are backed by LIST collection types

Hello Guys,

We started noticing strange behavior after we migrated one keyspace from existing cluster
to new cluster.

We expanded our source cluster from 18 node to 36 nodes and Didn't run "nodetool cleanup".
We took sstable backups on source cluster and restored which has duplicate data and restored
(sstableloader) it on to new cluster. Apparently applications started seeing duplicate data
mostly on list backed columns. Below is sstable2json output for one of the list backed columns.

Clustering Column1:Clustering Column2:mods (List collection type
ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],

Below is the select statement i would expect Cassandra to return data with latest timestamp
rather it returns duplicate values.

select mods from keyspace.table where partition_key ='1117302' and type='ModifierList' and
id=eb26e221-3a66-11e9-80b2-2102e728a233;

[image.png]

Any help or guidance is greatly appreciated.

--
Thanks & Regards
  Murali K Gutha
Mime
View raw message