cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <>
Subject Re: A blog about Cassandra in the IoT arena
Date Fri, 24 Aug 2018 16:06:02 GMT
No what I meant by infinite partition is not auto sub-partitioning, even at
server-side. Ideally Cassandra should be able to support infinite partition
size and make compaction, repair and streaming of such partitions

- compaction: find a way to iterate super efficiently through the whole
partition and merge-sort all sstables containing data of the same

 - repair: find another approach than Merkle tree because its resolution is
not granular enough. Ideally repair resolution should be at the clustering
level or every xxx clustering values

 - streaming: same idea as repair, in case of error/disconnection the
stream should be resumed at the latest clustering level checkpoint, or at
least should we checkpoint every xxx clustering values

 - partition index: find a way to index efficiently the huge partition.
Right now huge partition has a dramatic impact on partition index. The work
of Michael Kjellman on birch indices is going into the right direction

About tombstone, there is recently a research paper about Dotted DB and an
attempt to make delete without using tombstones:

On Fri, Aug 24, 2018 at 12:38 AM, Rahul Singh <>

> Agreed. One of the ideas I had on partition size is to automatically
> synthetically shard based on some basic patterns seen in the data.
> It could be implemented as a tool that would create a new table with an
> additional part of the key that is an automatic created shard, or it would
> use an existing key and then migrate the data.
> The internal automatic shard would adjust as needed and keep
> “Subpartitons” or “rowsets” but return the full partition given some
> special CQL
> This is done today at the Data Access layer and he data model design but
> it’s pretty much a step by step process that could be algorithmically done.
> Regarding the tombstone — maybe we have another thread dedicated to
> cleaning tombstones - separate from compaction. Depending on the amount of
> tombstones and a threshold, it would be dedicated to deletion. It may be an
> edge case , but people face issues with tombstones all the time because
> they don’t know better.
> Rahul
> On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan <>,
> wrote:
> As I used to tell some people, the day we make :
> 1. partition size unlimited, or at least huge partition easily manageable
> (compaction, repair, streaming, partition index file)
> 2. tombstone a non-issue
> that day, Cassandra will dominate any other IoT technology out there
> Until then ...
> On Thu, Aug 23, 2018 at 4:54 PM, Rahul Singh <
> > wrote:
>> Good analysis of how the different key structures affect use cases and
>> performance. I think you could extend this article with potential
>> evaluation of FiloDB which specifically tries to solve the OLAP issue with
>> arbitrary queries.
>> Another option is leveraging Elassandra (index in Elasticsearch
>> collocates with C*) or DataStax (index in Solr collocated with C*)
>> I personally haven’t used SnappyData but that’s another Spark based DB
>> that could be leveraged for performance real-time queries on the OLTP side.
>> Rahul
>> On Aug 23, 2018, 2:48 AM -0500, Affan Syed <>, wrote:
>> Hi,
>> we wrote a blog about some of the results that engineers from AN10 shared
>> earlier.
>> I am sharing it here for greater comments and discussions.
>> they-a-good-match/
>> Thank you.
>> - Affan

View raw message