cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-13826) Specialize row structure to support complex Materialized Views liveness
Date Wed, 30 Aug 2017 10:30:00 GMT
Paulo Motta created CASSANDRA-13826:
---------------------------------------

             Summary: Specialize row structure to support complex Materialized Views liveness
                 Key: CASSANDRA-13826
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13826
             Project: Cassandra
          Issue Type: Improvement
          Components: Materialized Views
            Reporter: Paulo Motta


Differently from an ordinary row, where a row is live if its PK or any column is live, a view
row has different liveness requirements, summarized by [~jasonstack] on CASSANDRA-11500:

{quote}
1. base pk and view pk are the same (order doesn't matter) and view has no filter conditions
or only conditions on base pk.
(filter condition mean: c = 1 in view's where clause. filter condition is not a concern here,
since no previous view data to be cleared.)

view row exists if any of following is true:
* base row pk has live livenessInfo(timestamp) and base row pk satifies view's filter conditions
if any.
* or one of base row columns selected in view has live timestamp (via update) and base row
pk satifies view's filter conditions if any. this is handled by existing mechanism of liveness
and tombstone since all info are included in view row
* or one of base row columns not selected in view has live timestamp (via update) and base
row pk satifies view's filter conditions if any. Those unselected columns' timestamp/ttl/cell-deletion
info are not currently stored on view row.

2. base column used in view pk or view has filter conditions on base non-key column which
can also lead to entire view row being wiped.

view row exists if any of following is true:

* base row pk has live livenessInfo(timestamp) && base column used in view pk is not
null but no timestamp && conditions are satisfied. ( pk having live livenesInfo means
it is not deleted by tombstone)
* or base row column in view pk has timestamp (via update) && conditions are satisfied.
eg. if base column used in view pk is TTLed, entire view row should be wiped.
{quote}

These additional requirements were overlooked during the original MV design and caused some
problems when base rows or columns are updated or removed, described on CASSANDRA-13127, CASSANDRA-13409,
CASSANDRA-11500 and CASSANDRA-13409.

On CASSANDRA-11500 we will do some tweaks to the existing mechanism to fix most of the above
issues, except correct support to out-of-order deletion of unselected column on view sharing
partition key components with base and filtering by non-PK columns. The former is a limitation
of the original MV design and the latter was a relatively recently introduced feature (CASSANDRA-10368)
which has overlooked this requirement and will be reverted on CASSANDRA-13798.

This ticket is to go back to the drawing board and discuss and implement a storage engine
extension to properly support the following cases:
- Out-of-order deletion of unselected column on view sharing partition key components with
base ([ignored test|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewTest.java#L87])
- Filtering by non-primary key and/or unselected columns (Follow-up CASSANDRA-13798, [ignored
tests|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewFilteringTest.java#L88])
- Rethink shadowable tombstone mechanism and remove workarounds introduced by CASSANDRA-11500,
such as using expired liveness info to represent commutative deletion ([TODO|https://github.com/apache/cassandra/blob/e0da138ab10f6c0fc014de86fb251e11358d80cc/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java#L429]).
- Add support to dropping unselected columns on base table and reflect that on views ([commented
test|https://github.com/apache/cassandra/commit/add5face50f2eccbc1a53e0fe22e2d79ba856db1])
- Upgrade from the previous to the new structure

Zhao virtual cells proposal from CASSANDRA-11500 is probably a good starting point, but we
need to discuss it and validate to make sure it's efficient, making adequate reuse of existing
structures and not introducing unnecessary complexity in the storage engine which we'll have
to be responsible for in the future. In addition to this we should probably contemplate supporting
multiple non-PK cols in MV clustering (CASSANDRA-10226) which introduces additional liveness
requirements for views in addition to the ones mentioned above, or other simplifications we
can make to the view row structure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message