cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tupshin Harper (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
Date Fri, 17 Jul 2015 20:52:08 GMT


Tupshin Harper commented on CASSANDRA-6477:

OK, so let me summarize my view of the conflicting viewpoints here
# If the MV shares the same partition key (and only reorders the partition based on different
clustering columns), then the problem is relatively easy. Unfortunately the general consensus
is that a common case will be to have different partition keys in the MV than the base table,
so we can't support only that easy case.
# If the MV has a different partition key than the base table, then there are inherently more
nodes involved in fulfilling the entire request, and we have to address that case.
# As [~tjake] and [~jbellis] say, the more nodes involved in a query, the higher the risk
of unavailability if the MV is updated synchronously.
# Some use cases expect synchronous updates (as argued by [~rustyrazorblade] and [~brianmhess]
# But others use cases definitely do not. I think it is absurd to say that just because a
table has a MV, every write should care about the MV. Even more absurd to say that adding
an MV to a table will reduce the availability of all writes to the base table. 

Given all of those, the conclusion that both sync and async forms are necessary seems totally

Ideally, I'd like to see an extension of what [~iamaleksey] proposed above but be much more
thorough and flexible about it.

If each request were able to pass multiple consistency-level contracts to the coordinator,
each one could represent the expectation for a separate callback at the driver level.
e.g. A query to a table with a MV could express the following compound consistency levels.
{noformat} {LQ, LOCAL_ONE{DC3,DC4}, LQ{MV1,MV2}} {noformat}
That would tell the coordinator to deliver three separate notifications back to the client.
One when LQ in the local dc was fulfilled. Another when at least one copy was delivered to
each of DC3 and DC4, and another when LQ was fulfilled in the local dc for MV1 and MV2.

I realize that this is a very far-fetched proposal, but I wanted to throw it out there as,
imo, it reflects the theoretically best option that fulfills everybody's requirements. (and
is also a very general mechanism that could be used in other scenarios).

Short of that,  I don't think there is any choice but to support both sync and async forms
of writes to tables with MVs.

One more point(not to distract from the above). With the current design of MVs, there will
always be risk of inconsistent reads (timeouts leaving data queryable in the primary table
but not in one or more MVs) until the data is eventually propagated to the MV. While it would
be at a high cost, RAMP would still be useful to be to provide read isolation in that scenario.

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>                 Key: CASSANDRA-6477
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>         Attachments:, users.yaml
> Local indexes are suitable for low-cardinality data, where spreading the index across
the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying
most nodes in the cluster even if only a handful of rows is returned.

This message was sent by Atlassian JIRA

View raw message