cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Dattu Koneru (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13547) Filtered materialized views missing data
Date Wed, 21 Jun 2017 06:23:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040527#comment-16040527
] 

Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/21/17 6:22 AM:
---------------------------------------------------------------------------


{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';

cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}

cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----

(0 rows)
{code}

Logic in ViewUpdateGenerator.java does not update the view row if updated column is not denormalized
in the view.
in the above case,{{enabled}} is not denormalized and so update has not propagated to the
view.View metadata only has pk columns + columns in select statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that all non-primary
key columns that have filters are denormalized.
(delete does not do this check because it does not have to. This is the reason row delete
worked when {{enabled}} is set to false.)

{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;

 name | id | enabled | foo
------+----+---------+------
  One |  1 |    True | null

(1 rows)
{code}

This happens because of the way liveliness/deletion info  is computed in the view. 
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of all the columns
(including non-pk) in the row and uses it in Deletion info when deleting.

But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest timestamp
of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones have higher timestamp
than live cell from base row.


I have uploaded a [patch for 3.11 | https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
which fixes above two issues.

I will make patches for other branches if this patch looks okay.
Comments appreciated !


was (Author: krishna.koneru):
I have uploaded a patch (with unittest) for problem 1 in above comment @ https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547

below problem addressed by this patch  (from the description)
{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----
{code}

This happens because view metadata does not have column "enabled" .
The decision "should this base table update, update view as well?" is made by looking at view
metadata (among other things).
Due to this, some updates to base table do not result in updates to MV even if update qualifies
.

This patch disallows usage of any non PK columns in WHERE clause if they are not in SELECT
list.

Comments are appreciated !


> Filtered materialized views missing data
> ----------------------------------------
>
>                 Key: CASSANDRA-13547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13547
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>         Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce).
>            Reporter: Craig Nicholson
>            Assignee: Krishna Dattu Koneru
>            Priority: Blocker
>              Labels: materializedviews
>             Fix For: 3.11.x
>
>
> When creating a materialized view against a base table the materialized view does not
always reflect the correct data.
> Using the following test schema:
> {code:title=Schema|language=sql}
> DROP KEYSPACE IF EXISTS test;
> CREATE KEYSPACE test
>   WITH REPLICATION = { 
>    'class' : 'SimpleStrategy', 
>    'replication_factor' : 1 
>   };
> CREATE TABLE test.table1 (
>                 id int,
>                 name text,
>                 enabled boolean,
>                 foo text,
>                 PRIMARY KEY (id, name));
> CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> {code}
> When I insert a row into the base table the materialized views are updated appropriately.
(+)
> {code:title=Insert row|language=sql}
> cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One', TRUE, 'Bar');
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
>   One |  1 | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
>   One |  1 |    True | Bar
> (1 rows)
> {code}
> Updating the record in the base table and setting enabled to FALSE will filter the record
from both materialized views. (+)
> {code:title=Disable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |   False | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
> (0 rows)
> {code}
> However a further update to the base table setting enabled to TRUE should include the
record in both materialzed views, however only one view (table1_mv2) gets updated. (-)
> It appears that only the view (table1_mv2) that returns the filtered column (enabled)
is updated. (-)
> Additionally columns that are not part of the partiion or clustering key are not updated.
You can see that the foo column has a null value in table1_mv2. (-)
> {code:title=Enable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+------
>   One |  1 |    True | null
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message