cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-10295) Support skipping MV read-before-write on a per-operation basis
Date Wed, 09 Sep 2015 17:16:45 GMT


Tyler Hobbs updated CASSANDRA-10295:
    Labels: client-impacting doc-impacting  (was: )

> Support skipping MV read-before-write on a per-operation basis
> --------------------------------------------------------------
>                 Key: CASSANDRA-10295
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Tyler Hobbs
>              Labels: client-impacting, doc-impacting
>             Fix For: 3.x
> This is similar in spirit to CASSANDRA-9779, but on a per-operation basis.  There are
many workloads that include a mixture of new insertions and overwrites.  In some cases, logic
outside of Cassandra guarantees that an inserted row does not already exist.  For example,
the primary key may include a UUID or another form of unique id (from, say, Snowflake).  
> When denormalizing manually, users can take advantage of this knowledge to avoid doing
a read-before-write, but with materialized views they don't have this option.  When the newly
inserted row also happens to be a new partition, MVs are still pretty efficient, because the
bloom filters allow us to quickly short circuit the read.  However, when new rows are inserted
to existing partitions, the reads can become costly.
> I'd like to consider exposing a way for the user to indicate that an inserted row is
new on a per-operation basis.  Internally, this could potentially use the mechanism from CASSANDRA-9779,
depending on how that's implemented.  As far as the API goes, I'm not sure.  Perhaps an "assertion"
clause in inserts would work well:
> {noformat}
> {noformat}
> The choice of API should also take into consideration potential future enhancements along
these lines.  For example, we might want to support asserting that a given column has a known
current value (as another means of avoiding read-before-writes).
> If we implement this, we should make sure that hints, logged batches, and commitlog replay
handle this safely.  If the original timestamp is used for replay, I believe it should be
idempotent (during the gc_grace window), but I could be missing something.

This message was sent by Atlassian JIRA

View raw message