cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13066) Fast streaming with materialized views
Date Thu, 13 Apr 2017 14:25:42 GMT


Paulo Motta commented on CASSANDRA-13066:

While it may make sense to pursue this optimization, I'm not sure adding a {{mv_fast_stream}}
option is the best way to expose this to general usage for the following reasons:
a) It has a limited scope requiring users to know streaming internals of MVs to enable it,
so it's not very friendly.
b) It has has a significant foot-shooting potential, when users enable this and perform partial
writes or updates to existing rows, so users may enable it thinking fast=good without thinking
of the consequences.

It basically boils down to this Sylvain's comment on CASSANDRA-9779:

bq. It seems clear to me that this will add complexity from the user point of view (it's a
new concept that will either have good footshooting potential (if we were to just trust the
user to insert only without checking it) and be annoying to use (if we force all columns every
time)), so it sounds to me like we would need to demonstrate fairly big performance benefits
to be worth doing (keep in mind that once we add such thing, we can't easily remove it, even
if the improvement become obsolete).

With this said, since this would only be applicable to append-only MVs so I'd be more in favor
of providing the whole feature set of append-only MVs instead which would include this and
other optimizations (such as skipping read-before-write) and also enforce the append-only
contract defined on MV creation, being much safer and having a more well defined semantics
to users.

> Fast streaming with materialized views
> --------------------------------------
>                 Key: CASSANDRA-13066
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Roth
>            Assignee: Benjamin Roth
>             Fix For: 4.0
> I propose adding a configuration option to send streams of tables with MVs not through
the regular write path.
> This may be either a global option or better a CF option.
> Background:
> A repair of a CF with an MV that is much out of sync creates many streams. These streams
all go through the regular write path to assert local consistency of the MV. This again causes
a read before write for every single mutation which again puts a lot of pressure on the node
- much more than simply streaming the SSTable down.
> In some cases this can be avoided. Instead of only repairing the base table, all base
+ mv tables would have to be repaired. But this can break eventual consistency between base
table and MV. The proposed behaviour is always safe, when having append-only MVs. It also
works when using CL_QUORUM writes but it cannot be absolutely guaranteed, that a quorum write
is applied atomically, so this can also lead to inconsistencies, if a quorum write is started
but one node dies in the middle of a request.
> So, this proposal can help a lot in some situations but also can break consistency in
others. That's why it should be left upon the operator if that behaviour is appropriate for
individual use cases.
> This issue came up here:

This message was sent by Atlassian JIRA

View raw message