cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhaoYang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
Date Fri, 01 Sep 2017 09:51:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150303#comment-16150303
] 

ZhaoYang commented on CASSANDRA-11500:
--------------------------------------

[~pauloricardomg] thanks for the feedback (y)

bq. I wasn't very comfortable with our previous approach of enforcing strict liveness during
row merge, since it changes a lot of low-level structures/interfaces (like BTreeRow/MergeListener,
etc) to enforce a table-level setting. Since we'll probably get rid of this when doing a proper
implementation of virtual cells , I updated on this commit to perform the filtering during
read instead which will give us the same result but with less change in unrelated code. Do
you see any problem with this approach?

As we discussed offline, we need to make sure the raw data including tombstone, expired liveness
are shipped to the coordinator side.  Enforcing strict liveness in {{ReadCommand.executeLocally()}}
would remove the row before digest or data response. Instead, we add {{enforceStrictLiveness}}
to {{Row.purge}} to get the same result but less interfaces changes for {{Row}}.

bq. One problem of replacing shadowable tombstones by expired liveness info is that it stores
an additional unused ttl field for every shadowed view entry to solve the commutative view
deletion problem. In order to avoid this I updated the patch to only use expired ttl when
a shadowable tombstone would not work along with an explanation on why that is used since
it's a hack

Shadowable tombstone will be deprecated and use expired livenessInfo if the deletion time
is greater than merged-row deletion to avoid uncessary expired livenessInfo.

bq. in TableViews.java, the DeletionTracker should be applied even if existing has no data,
eg. partition-deletion

It's tested by  "testRangeDeletionWithFlush()" in ViewTest. Without partition deletion info
from deletion tracker, existing row is given as empty and it will resurrect deleted cells.

bq.  In order to prevent against this, I added a note to the Upgrading section of NEWS.txt
explaining about this caveat and that running repair before the upgrade should be sufficient
to avoid it.

(y)

| source | unit | [dtest| 
| [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] |  https://circleci.com/gh/jasonstack/cassandra/551
| x |
| [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11] |  https://circleci.com/gh/jasonstack/cassandra/557
| x |
| [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0] |  https://circleci.com/gh/jasonstack/cassandra/556|
 x |
| [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]|
{code}
Changes:
1. Using expired livenessInfo if computed deletion time is greater than merged row deletion.
There are only 2 cases:
      a. non-pk base column used in view pk is removed by partial update or partial delete
      b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting deleted cells.
We will expired-livenessInfo and merged base row deletion instead. 

2. It's strict-liveness iff there is non-key base column in view-pk. The existence of view
row is solely base on this non-key base column.

3. If there is no non-pk base column in view-pk, the view's liveness/deletion is using max
of base livenessIno + unselected column. unselected column's ttl is used only when it affects
view row liveness. Selected columns won't contribute to livenessInfo or row deletion.
    * this wouldn't support complex cases as explained above. eg. c/d unselected, update c@10,
delete c@11, update d@5. view row should be alive but dead

4. in TableViews.java, the DeletionTracker should be applied even if existing has no data,
eg. partition-deletion

5. When generating read command to read existing base data, need to query all base columns
instead of view's queried column if base and view having same key columns to read unselected
column. 
{code}

> Obsolete MV entry may not be properly deleted
> ---------------------------------------------
>
>                 Key: CASSANDRA-11500
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: ZhaoYang
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an update changes
that column value, we add the new view entry and remove the old one. When doing that removal,
the current code uses the same timestamp than for the liveness info of the new entry, which
is the max timestamp for any columns participating to the view PK. This is not correct for
the deletion as the old view entry could have other columns with higher timestamp which won't
be deleted as can easily shown by the failing of the following test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT NULL
PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old (invalid) and
the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in the old
view entry (which we know since we read the pre-existing base row), and that is what CASSANDRA-11475
does (the test above thus doesn't fail on that branch).
> Unfortunately, even then we can still have problems if further updates requires us to
overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT NULL
PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the entry for a=1
with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert an entry
for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this game more
than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not general enough:
we need to be able to re-insert an entry when a prior one had been deleted before, but we
can't rely on timestamps being strictly bigger on the re-insert. In that sense, this can be
though as a similar problem than CASSANDRA-10965, though the solution there of a single flag
is not enough since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be able to
decide when a view deletion is shadowed. Which means that both liveness info (for updates)
and shadowable deletion would need to ship the timestamp of any base table column that is
part the view PK (so {{a}} in the example below).  It's doable (and not that hard really),
but it does require a change to the sstable and intra-node protocol, which makes this a bit
painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even enough since
on equal timestamp the value can be the deciding factor. So in theory we'd have to ship the
value of those columns (in the case of a deletion at least since we have it in the view PK
for updates). That said, on that last problem, my preference would be that we start prioritizing
CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp anymore, which
would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message