cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thakrar, Jayesh" <jthak...@conversantmedia.com>
Subject Re: Question: Behavior of inserting a list multiple times with same timestamp
Date Fri, 30 Jun 2017 17:46:40 GMT
Thanks Sander - this helps get a better understanding!

From: Fridtjof Sander <fridtjof.sander@googlemail.com>
Date: Friday, June 30, 2017 at 4:19 AM
To: Vladimir Yudovin <vladyu@winguzone.com>, "Thakrar, Jayesh" <jthakrar@conversantmedia.com>
Cc: Subroto Barua <sbarua116@yahoo.com>, Zhongxiang Zheng <zzheng@yahoo-corp.jp>,
"user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Question: Behavior of inserting a list multiple times with same timestamp

On insert, C* creates a tombstone for the list-column with "T-1" (it's actually -1us) as timestamp.
The second insert creates that tombstone again, but that doesn't delete the previous insert,
since it's timestamp is T not T-1us:

$ INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
$ flush
$ sstabledump 1.db
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 48,
        "liveness_info" : { "tstamp" : "1970-01-01T00:00:00.001Z" },
        "cells" : [
          { "name" : "v", "deletion_info" : { "marked_deleted" : "1970-01-01T00:00:00.000999Z",
"local_delete_time" : "2017-06-30T09:01:26Z" } },
          { "name" : "v", "path" : [ "b29084c0-5d72-11e7-a054-ebe76dac4f21" ], "value" : "1"
}
        ]
      }
    ]
  }
]
$ INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
$ flush
$ sstabledump 2.db
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 48,
        "liveness_info" : { "tstamp" : "1970-01-01T00:00:00.001Z" },
        "cells" : [
          { "name" : "v", "deletion_info" : { "marked_deleted" : "1970-01-01T00:00:00.000999Z",
"local_delete_time" : "2017-06-30T09:03:58Z" } },
          { "name" : "v", "path" : [ "0da21e00-5d73-11e7-a054-ebe76dac4f21" ], "value" : "1"
}
        ]
      }
    ]
  }
]

(The uuids are generated based on server timestamp I believe). That's why you get both: On
read both values are kept, since their TS is 1ms and are not deleted by the 999us tombstones.

If you are using actual, incrementing timestamps the tombstones of subsequent inserts do delete
previous values, since their timestamps are bigger.

Am 26.06.17 um 13:12 schrieb Vladimir Yudovin:
Hi,
INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;
SELECT * FROM test.test ;

cqlsh> SELECT * FROM test.test ;

k | v
---+-----
1 | [3]
// ===== WHY ?? =====

TIMESTAMP is measured as epoch_in_microseconds, so USING TIMESTAMP 1000 is like insert some-when
on 1970-01-01, thus it has no effect after insert without timestamp, that got current time.

Regarding your original question: it's really look strange, may be you should file JIRA about
this.

Best regards, Vladimir Yudovin,
Winguzone<https://winguzone.com?from=list> - Cloud Cassandra Hosting


---- On Tue, 20 Jun 2017 10:19:08 -0400 Thakrar, Jayesh <jthakrar@conversantmedia.com><mailto:jthakrar@conversantmedia.com>
wrote ----

Ok, tried the test again, w/o the TIMESTAMP, and got the expected behavior.
Apparently, the INSERT does replace the entire list if no timestamp is specified (as expected).
However, if the TIMESTAMP is specified, then it does (what appears to be) an append.
But found even more weird issue - see later below!

===============================================

cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);
cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;
cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;
cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;
cqlsh> SELECT * FROM test.test ;

 k | v
---+-----
 1 | [1]

(1 rows)
cqlsh>

===============================================


DROP TABLE IF EXISTS test.test ;
CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);
INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;
SELECT * FROM test.test ;

cqlsh> SELECT * FROM test.test ;

 k | v
---+-----
 1 | [3]

// ===== EXPECTED RESULT =====

INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;
SELECT * FROM test.test ;

cqlsh> SELECT * FROM test.test ;

 k | v
---+-----
 1 | [3]

// ===== WHY ?? =====


DROP TABLE IF EXISTS test.test ;
CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);
INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;
SELECT * FROM test.test ;

cqlsh> SELECT * FROM test.test ;

 k | v
---+--------------
 1 | [10, 20, 30]

// ===== WHY ?? Probably the server-timestamp-uuid playing a role?! =====

INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;
INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;
SELECT * FROM test.test ;

cqlsh> SELECT * FROM test.test ;

 k | v
---+-----
 1 | [3]

// ===== EXPECTED RESULT =====





From: Subroto Barua <sbarua116@yahoo.com<mailto:sbarua116@yahoo.com>>
Date: Monday, June 19, 2017 at 11:09 PM
To: "Thakrar, Jayesh" <jthakrar@conversantmedia.com<mailto:jthakrar@conversantmedia.com>>,
Subroto Barua <sbarua116@yahoo.com.INVALID<mailto:sbarua116@yahoo.com.INVALID>>,
Zhongxiang Zheng <zzheng@yahoo-corp.jp<mailto:zzheng@yahoo-corp.jp>>
Cc: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Question: Behavior of inserting a list multiple times with same timestamp

here is the response from Datastax support/dev:



In a list each item is its own cell. Append adds a new cell sorted at basically "current server
time uuid" prepend adds at "-current server time uuid". User supplied time stamps are used
for the cell timestamp when specified.

Inserting the entire list deletes and then inserts

Reading reads out the entire list
Positional access reads the entire list and gets/puts at the spot specified


Basically, lists are not idempotent


On Monday, June 19, 2017, 6:55:40 AM PDT, Thakrar, Jayesh <jthakrar@conversantmedia.com<mailto:jthakrar@conversantmedia.com>>
wrote:


Subroto,

Cassandra docs say otherwise.

Writing list data is accomplished with a JSON-style syntax. To write a record using INSERT,
specify the entire list as a JSON array. Note: An INSERT will always replace the entire list.

Maybe you can elaborate/shed some more light?

Thanks,
Jayesh


Lists

A list is a typed collection of non-unique values where elements are ordered by there position
in the list. To create a column of type list, use the list keyword suffixed with the value
type enclosed in angle brackets. For example:

CREATE TABLE plays (
    id text PRIMARY KEY,
    game text,
    players int,
    scores list<int>
)
Do note that as explained below, lists have some limitations and performance considerations
to take into account, and it is advised to prefer sets over lists when this is possible.

Writing list data is accomplished with a JSON-style syntax. To write a record using INSERT,
specify the entire list as a JSON array. Note: An INSERT will always replace the entire list.

INSERT INTO plays (id, game, players, scores)
          VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
Adding (appending or prepending) values to a list can be accomplished by adding a new JSON-style
array to an existing list column.

UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = '123-afde';
UPDATE plays SET players = 5, scores = [ 12 ] + scores WHERE id = '123-afde';
It should be noted that append and prepend are not idempotent operations. This means that
if during an append or a prepend the operation timeout, it is not always safe to retry the
operation (as this could result in the record appended or prepended twice).

Lists also provides the following operation: setting an element by its position in the list,
removing an element by its position in the list and remove all the occurrence of a given value
in the list. However, and contrarily to all the other collection operations, these three operations
induce an internal read before the update, and will thus typically have slower performance
characteristics. Those operations have the following syntax:

UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';                // sets the 2nd element
of scores to 7 (raises an error is scores has less than 2 elements)
DELETE scores[1] FROM plays WHERE id = '123-afde';                  // deletes the 2nd element
of scores (raises an error is scores has less than 2 elements)
UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id = '123-afde'; // removes all occurrences
of 12 and 21 from scores
As with maps, TTLs if used only apply to the newly inserted/updated values.

On 6/19/17, 1:12 AM, "Subroto Barua" <sbarua116@yahoo.com.INVALID<mailto:sbarua116@yahoo.com.INVALID>>
wrote:

    This is an expected behavior.

    We learned this issue/feature at the current site (we use Dse 5.08)

    Subroto

    > On Jun 18, 2017, at 10:29 PM, Zhongxiang Zheng <zzheng@yahoo-corp.jp<mailto:zzheng@yahoo-corp.jp>>
wrote:
    >
    > Hi all,
    >
    > I have a question about a behavior when insert a list with specifying timestamp.
    >
    > It is documented that "An INSERT will always replace the entire list."
    > https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile#lists
    >
    > However, When a list is inserted multiple times using same timestamp,
    > it will not be replaced, but will be added as follows.
    >
    > cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);
    > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 1000 ;
    > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 1000 ;
    > cqlsh> SELECT * FROM test.test ;
    >
    > k | v
    > ---+--------
    > 1 | [1, 1]
    >
    > I confirmed this behavior is reproduced in 3.0.13 and 3.10.
    > I'd like to ask whether this behavior is a expected behavior or a bug?
    >
    > In our use case, CQL statements with same values and timestamp will be issued multiple
times
    > to retry inserting under the assumption that insert is idempotent.
    > So, I expect that the entire list will be replace even if insert a list multiple
times with same timestamp.
    >
    > Thanks,
    >
    > Zhongxiang
    >
    >
    > ТÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÒÐÐ¥Fò
Vç7V'67&–&RÂRÖ֖â
W6W"×Vç7V'67&–&T676æG&æ6†Ræ÷&pФf÷"FF—F–öæÂ6öÖÖæG2ÂRÖ֖â
W6W"Ö†VÇ676æG&æ6†Ræ÷&pÐ





B KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB [ X ܚX KK[XZ[ \
\ ][ X ܚX P \ [ K \X K ܙ B ܈Y][ۘ[ [X[ K[XZ[ \ \ Z[ \ [ K \X K ܙ B



Mime
View raw message