cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8543) Allow custom code to control behavior of reading and compaction
Date Mon, 29 Dec 2014 16:39:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260204#comment-14260204
] 

Aleksey Yeschenko commented on CASSANDRA-8543:
----------------------------------------------

Use native protocol batching with prepared separate inserts - but make sure that you only
batch columns/rows with the same partition key.

Use DateTieredCompactionStrategy (https://labs.spotify.com/2014/12/18/date-tiered-compaction/).

And, more importantly, don't try to optimize before you actually need it.

In any case, CASSANDRA-6412 is very unlikely to make it into Cassandra until 3.1 or 3.2, if
at all, so any wins that you could get from your blob-packing will be negated by the need
to do a read before write.

You also lose convenient querying on lesser than 1024 limits, and the ability to reuse 3.0
aggregate functions on your values. Also complicating MR/Spark jobs and losing ability to
use some of those pre-defined methods.

> Allow custom code to control behavior of reading and compaction
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-8543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8543
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Pavol Slamka
>            Priority: Minor
>
> When storing series data in blob objects because of speed improvements, it is sometimes
neccessary to change only few values of a single blob (say few integers out of 1024 integers).
Right now one could rewrite these using compare and set and versioning - read blob and version,
change few values, write whole updated blob and incremented version if version did not change,
repeat the whole process otherwise (optimistic approach). However compare and set brings some
overhead. Let's try to leave out compare and set, and instead reading and updating, let's
write only "blank" blob with only few values set. Blank blob contains special blank placeholder
data such as NULL or max value of int or similar. Since this write in fact only appends new
SStable record, we did not overwrite the old data yet. That happens during read or compaction.
But if we provided custom read, and custom compaction, which would not replace the blob with
a new "sparse blank" blob, but rather would replace values in first blob (first sstable record)
with only "non blank" values from second blob (second sstable record), we would achieve fast
partial blob update without compare and set on a last write wins basis. Is such approach feasible?
Would it be possible to customize Cassandra so that custom code for compaction and data reading
could be provided for a column (blob)? 
> There may be other better solutions, but speedwise, this seems best to me. Sorry for
any mistakes, I am new to Cassandra.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message