cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muralikrishna Gutha <muralikgu...@gmail.com>
Subject Re: Duplicates columns which are backed by LIST collection types
Date Thu, 24 Oct 2019 15:39:19 GMT
Thanks, List datatype has been in-use for this table almost over a few
years now and never had issues. We ran into this issue recently when we did
the keyspace migration.

Thanks,
Murali

On Thu, Oct 24, 2019 at 11:36 AM ZAIDI, ASAD <az192g@att.com> wrote:

> Have you chosen correct datatype to begin with, if you don’t want
> duplicates?
>
>
>
> Generally speaking:
>
>
>
> A set and a list both represent multiple values but do so differently.
>
> A set doesn’t save ordering and values are sorted in ascending order. No
> duplicates are allowed.
>
>
>
> A list saves ordering where you append or prepend the value into the list.
> A list allows duplicates.
>
>
>
>
>
>
>
> *From:* Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
> *Sent:* Thursday, October 24, 2019 10:27 AM
> *To:* user@cassandra.apache.org
> *Cc:* Muralikrishna Gutha <muralikgutha@gmail.com>
> *Subject:* Duplicates columns which are backed by LIST collection types
>
>
>
> Hello Guys,
>
>
>
> We started noticing strange behavior after we migrated one keyspace from
> existing cluster to new cluster.
>
>
>
> We expanded our source cluster from 18 node to 36 nodes and Didn't run
> "nodetool cleanup".
>
> We took sstable backups on source cluster and restored which has duplicate
> data and restored (sstableloader) it on to new cluster. Apparently
> applications started seeing duplicate data mostly on list backed columns.
> Below is sstable2json output for one of the list backed columns.
>
>
>
> Clustering Column1:Clustering Column2:mods (List collection type
>
> ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233
>
>
>
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],
>
>
>
> Below is the select statement i would expect Cassandra to return data with
> latest timestamp rather it returns duplicate values.
>
>
>
> select mods from keyspace.table where partition_key ='1117302' and
> type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;
>
>
>
> [image: image.png]
>
>
>
> Any help or guidance is greatly appreciated.
>
>
>
> --
>
> Thanks & Regards
>   Murali K Gutha
>


-- 
Thanks & Regards
  Murali K Gutha

Mime
View raw message