cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Dattu Koneru (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13547) Filtered materialized views missing data
Date Thu, 29 Jun 2017 04:57:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067734#comment-16067734
] 

Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/29/17 4:56 AM:
---------------------------------------------------------------------------

Thanks [~jasonstack] ! 

I will try to rework on {{1. Missing Update}} to address your comment.

About 
 {quote}
Using the greater timestamp from view's columns(pk+non-pk) in base row will later shadow entire
row in view if there is a normal column in base as primary key in view.
{quote}

This looks like a nasty problem. This patch does not cause this. This is a existing behaviour
that any updates to view's pk columns will make old row marked as tombstone (at highest timestamp
of all columns in base row) and will create a new row with updated pk. 

See view row timestamps in the below example : 

Existing behaviour without patch:

{code}
INSERT INTO test (a, b, c, d) VALUES (1, 1, 1, 1) using timestamp 0;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=1 ts=0], [d=1 ts=0]
mv_test1  : [1]@0 Row[info=[ts=0] ]: 1 | [c=1 ts=0], [d=1 ts=0]
{code}

{code}
UPDATE test using timestamp 5 set c = 0 WHERE a=1;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=0 ts=5], [d=1 ts=0]
mv_test1  : [1]@0 Row[info=[ts=0] ]: 1 | [c=0 ts=5], [d=1 ts=0]
{code}

{code}
UPDATE test using timestamp 1 set b = 0 WHERE a=1;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=0 ts=1], [c=0 ts=5], [d=1 ts=0]
mv_test1  :
            [1]@0 Row[info=[ts=1] ]: 0 | [c=0 ts=5], [d=1 ts=0]
            [1]@39 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |
{code}
{code}

UPDATE test using timestamp 2 set b = 1 WHERE a=1;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=2], [c=0 ts=5], [d=1 ts=0]

View (before compaction)
[1]@0 Row[info=[ts=1] ]: 0 | [c=0 ts=5], [d=1 ts=0]
[1]@39 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707897(shadowable)
]: 0 |
[1]@30 Row[info=[ts=2] ]: 1 | [c=0 ts=5], [d=1 ts=0]

View (after compaction)
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707897(shadowable)
]: 0 |
[1]@31 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |

{code}


With this patch :

{code}

INSERT INTO test (a, b, c, d) VALUES (1, 1, 1, 1) using timestamp 0;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=1 ts=0], [d=1 ts=0]
view  : [1]@0 Row[info=[ts=0] ]: 1 | [c=1 ts=0], [d=1 ts=0]
{code}
{code}
UPDATE test using timestamp 5 set c = 0 WHERE a=1;
UPDATE test using timestamp 1 set b = 0 WHERE a=1; 

table : [1]@0 Row[info=[ts=0] ]:  | [b=0 ts=1], [c=0 ts=5], [d=1 ts=0]
view  :
        [1]@0 Row[info=[ts=5] ]: 0 | [c=0 ts=5], [d=1 ts=0] /* row ts=5 because of this patch
*/
        [1]@38 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |
{code}
{code}
UPDATE test using timestamp 2 set b = 1 WHERE a=1;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=2], [c=0 ts=5], [d=1 ts=0]

View (before compaction)
[1]@0 Row[info=[ts=5] ]: 0 | [c=0 ts=5], [d=1 ts=0] /* row ts=5 because of this patch */
[1]@38 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498709103(shadowable)
]: 0 |
[1]@30 Row[info=[ts=5] ]: 1 | [c=0 ts=5], [d=1 ts=0] /*-- row ts=5 because of this patch */

View (after compaction)
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498709103(shadowable)
]: 0 |
[1]@31 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |

{code}

I am not sure yet how to fix this issue ... given that if live row and tombstone have same
timestamp , tombstone wins.

Another problem is that all view deletes are marked as shadowable. But then that is a different
problem and I belive it is being fixed in  https://issues.apache.org/jira/browse/CASSANDRA-13409
.


was (Author: krishna.koneru):
Thanks [~jasonstack] ! 

I will try to rework on {{1. Missing Update}} to address your comment.

About 
 {quote}
Using the greater timestamp from view's columns(pk+non-pk) in base row will later shadow entire
row in view if there is a normal column in base as primary key in view.
{quote}

This looks like a nasty problem. This patch does not cause this. This is a existing behaviour
that any updates to view's pk columns will make old row marked as tombstone (at highest timestamp
of all columns in base row) and will create a new row with updated pk. 

See view row timestamps in the below example : 

Existing behaviour without patch:

{code}
INSERT INTO test (a, b, c, d) VALUES (1, 1, 1, 1) using timestamp 0;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=1 ts=0], [d=1 ts=0]
mv_test1  : [1]@0 Row[info=[ts=0] ]: 1 | [c=1 ts=0], [d=1 ts=0]
{code}

{code}
UPDATE test using timestamp 5 set c = 0 WHERE a=1;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=0 ts=5], [d=1 ts=0]
mv_test1  : [1]@0 Row[info=[ts=0] ]: 1 | [c=0 ts=5], [d=1 ts=0]
{code}

{code}
UPDATE test using timestamp 1 set b = 0 WHERE a=1;

test      : [1]@0 Row[info=[ts=0] ]:  | [b=0 ts=1], [c=0 ts=5], [d=1 ts=0]
mv_test1  :
            [1]@0 Row[info=[ts=1] ]: 0 | [c=0 ts=5], [d=1 ts=0]
            [1]@39 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |
{code}
{code}

UPDATE test using timestamp 2 set b = 1 WHERE a=1;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=2], [c=0 ts=5], [d=1 ts=0]

View (before compaction)
[1]@0 Row[info=[ts=1] ]: 0 | [c=0 ts=5], [d=1 ts=0]
[1]@39 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707897(shadowable)
]: 0 |
[1]@30 Row[info=[ts=2] ]: 1 | [c=0 ts=5], [d=1 ts=0]

View (after compaction)
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707897(shadowable)
]: 0 |
[1]@31 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498707556(shadowable)
]: 1 |

{code}


With this patch :

{code}

INSERT INTO test (a, b, c, d) VALUES (1, 1, 1, 1) using timestamp 0;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=0], [c=1 ts=0], [d=1 ts=0]
view  : [1]@0 Row[info=[ts=0] ]: 1 | [c=1 ts=0], [d=1 ts=0]
{code}
{code}
UPDATE test using timestamp 5 set c = 0 WHERE a=1;
UPDATE test using timestamp 1 set b = 0 WHERE a=1; 

table : [1]@0 Row[info=[ts=0] ]:  | [b=0 ts=1], [c=0 ts=5], [d=1 ts=0]
view  :
        [1]@0 Row[info=[ts=5] ]: 0 | [c=0 ts=5], [d=1 ts=0] /* row ts=5 because of this patch
*/
        [1]@38 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |
{code}
{code}
UPDATE test using timestamp 2 set b = 1 WHERE a=1;

table : [1]@0 Row[info=[ts=0] ]:  | [b=1 ts=2], [c=0 ts=5], [d=1 ts=0]

View (before compaction)
[1]@0 Row[info=[ts=5] ]: 0 | [c=0 ts=5], [d=1 ts=0] /* row ts=5 because of this patch */
[1]@38 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498709103(shadowable)
]: 0 |
[1]@30 Row[info=[ts=5] ]: 1 | [c=0 ts=5], [d=1 ts=0] /*-- row ts=5 because of this patch */

View (after compaction)
[1]@0 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498709103(shadowable)
]: 0 |
[1]@31 Row[info=[ts=-9223372036854775808] del=deletedAt=5, localDeletion=1498708886(shadowable)
]: 1 |

{code}

I am not sure yet how to fix this issue ... given that if live row and tombstone have same
timestamp , tombstone wins.

Another problem is that these tombstones should not be marked as shadowable. But then that
is a different problem and I belive it is being fixed in  https://issues.apache.org/jira/browse/CASSANDRA-13409
.

> Filtered materialized views missing data
> ----------------------------------------
>
>                 Key: CASSANDRA-13547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13547
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>         Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce).
>            Reporter: Craig Nicholson
>            Assignee: Krishna Dattu Koneru
>            Priority: Blocker
>              Labels: materializedviews
>             Fix For: 3.11.x
>
>
> When creating a materialized view against a base table the materialized view does not
always reflect the correct data.
> Using the following test schema:
> {code:title=Schema|language=sql}
> DROP KEYSPACE IF EXISTS test;
> CREATE KEYSPACE test
>   WITH REPLICATION = { 
>    'class' : 'SimpleStrategy', 
>    'replication_factor' : 1 
>   };
> CREATE TABLE test.table1 (
>                 id int,
>                 name text,
>                 enabled boolean,
>                 foo text,
>                 PRIMARY KEY (id, name));
> CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> {code}
> When I insert a row into the base table the materialized views are updated appropriately.
(+)
> {code:title=Insert row|language=sql}
> cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One', TRUE, 'Bar');
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
>   One |  1 | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
>   One |  1 |    True | Bar
> (1 rows)
> {code}
> Updating the record in the base table and setting enabled to FALSE will filter the record
from both materialized views. (+)
> {code:title=Disable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |   False | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
> (0 rows)
> {code}
> However a further update to the base table setting enabled to TRUE should include the
record in both materialzed views, however only one view (table1_mv2) gets updated. (-)
> It appears that only the view (table1_mv2) that returns the filtered column (enabled)
is updated. (-)
> Additionally columns that are not part of the partiion or clustering key are not updated.
You can see that the foo column has a null value in table1_mv2. (-)
> {code:title=Enable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+------
>   One |  1 |    True | null
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message