cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9779) Append-only optimization
Date Sun, 12 Jul 2015 10:19:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623732#comment-14623732
] 

Robert Stupp commented on CASSANDRA-9779:
-----------------------------------------

IMO it would be logical to disallow {{UPDATE}} for {{WITH INSERTS ONLY}} tables (and that's
what {{with INSERTs only}} says).

Would {{WITH INSERTS ONLY}} mean to also restrict to primary-keys without clustering-key?
Maybe I didn't completely get it. What I'm thinking about is that one partition can still
be split over memtable + multiple sstables - which would conflict with the compaction/read-path
optimizations. For example, if you have a table with {{PRIMARY KEY ( (year, month, day), hour,
minute, second)}} with several millions INSERTs per day, it's likely that this will result
in multiple sstables per day. Mean - I'm a bit afraid that partitions get too tiny with all
its consequences (too many queries, not able to insert from different clients for the same
day).

If such a {{WITH INSERTS ONLY}} table has no clustering-key, even more optimizations might
be possible (key-cache key would not need the sstable ref in the key, but in the value - so
we could do the key-cache lookup and skip bloom-filter lookup on hit).

> Append-only optimization
> ------------------------
>
>                 Key: CASSANDRA-9779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9779
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> Many common workloads are append-only: that is, they insert new rows but do not update
existing ones.  However, Cassandra has no way to infer this and so it must treat all tables
as if they may experience updates in the future.
> If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for instance)
then we could do a number of optimizations:
> - Compaction would only need to worry about defragmenting partitions, not rows.  We could
default to DTCS or similar.
> - CollationController could stop scanning sstables as soon as it finds a matching row
> - Most importantly, materialized views wouldn't need to worry about deleting prior values,
which would eliminate the majority of the MV overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message