cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9779) Append-only optimization
Date Wed, 22 Jun 2016 14:50:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344405#comment-15344405
] 

T Jake Luciani edited comment on CASSANDRA-9779 at 6/22/16 2:50 PM:
--------------------------------------------------------------------

bq.  if you violate the INSERTS ONLY contract by updating existing rows, Cassandra will give
you one of those versions back when you query it, but not necessarily the most recent.

It sounds like you are saying there are no guarantees.   

I've given this some thought and I think the best approach in which we can syntactically "do
something" is to combine this ticket with the idea [~thobbs] touched on in CASSANDRA-9928.
This might be what you are describing we should do but I'll just restate it.

bq. One possible solution is to require that all non-PK columns that are in a view PK be updated
simultaneously. T Jake Luciani mentioned possible problems from read repair, but it seems
like, with this restriction in place, any read repairs would end up repairing all non-PK columns
at once.

Basically, this would add a mode where we only allow INSERT of *all* columns every time. 
While this sounds restrictive, it also forces the user to deal with the fact that making updates
conceptually/logistically hard since we would kick out all client mutations that don't specify
all columns.  Sure you could subvert this but to me at least, the server can alert the user
that updating existing data, as they can with other tables, is hard.

So the proposal is:

  * Add a table level flag/syntax to mark that a table is INSERT ONLY (which can be altered
if there's an emergency).
  * Reject any INSERTS/UPSERTS that do not specify all columns
  * Possibly always return the earliest row if there is a conflict.
  * When writing to the memtable we can add a putIfAbsent method to reject/ignore updates
(to cover some minimal bases) 


was (Author: tjake):
bq.  if you violate the INSERTS ONLY contract by updating existing rows, Cassandra will give
you one of those versions back when you query it, but not necessarily the most recent.

It sounds like you are saying there are no guarantees.   

I've given this some thought and I think the best approach in which we can syntactically "do
something" is to combine this ticket with the idea [~thobbs] touched on in CASSANDRA-9928.
This might be what you are describing we should do but I'll just restate it.

bq. One possible solution is to require that all non-PK columns that are in a view PK be updated
simultaneously. T Jake Luciani mentioned possible problems from read repair, but it seems
like, with this restriction in place, any read repairs would end up repairing all non-PK columns
at once.

Basically, this would add a mode where we INSERT *all* columns every time.  While this sounds
restrictive, it also forces the user to deal with the fact that making updates conceptually/logistically
hard since we would kick out all client mutations that don't specify all columns.  Sure you
could subvert this but to me at least, the server can alert the user that updating existing
data, as they can with other tables, is hard.

So the proposal is:

  * Add a table level flag/syntax to mark that a table is INSERT ONLY (which can be altered
if there's an emergency).
  * Reject any INSERTS/UPSERTS that do not specify all columns
  * Possibly always return the earliest row if there is a conflict.
  * When writing to the memtable we can add a putIfAbsent method to reject/ignore updates
(to cover some minimal bases) 

> Append-only optimization
> ------------------------
>
>                 Key: CASSANDRA-9779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9779
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> Many common workloads are append-only: that is, they insert new rows but do not update
existing ones.  However, Cassandra has no way to infer this and so it must treat all tables
as if they may experience updates in the future.
> If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for instance)
then we could do a number of optimizations:
> - Compaction would only need to worry about defragmenting partitions, not rows.  We could
default to DTCS or similar.
> - CollationController could stop scanning sstables as soon as it finds a matching row
> - Most importantly, materialized views wouldn't need to worry about deleting prior values,
which would eliminate the majority of the MV overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message