cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Kale (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-11407) Proposal for simplified DTCS
Date Tue, 22 Mar 2016 22:43:25 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anubhav Kale updated CASSANDRA-11407:
-------------------------------------
    Description: 
Today's DTCS implementation has been discussed and debated in a few JIRAs already (the notable
one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges with
the current approach is that it is very difficult to reason about how the "Target" class makes
buckets, thus making it difficult to reason about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS properties
intact that makes it a great fit for time-series data. The simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from min and make
windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum
sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for min_threshold
windows before making a bigger one. I would argue this simplifies the algorithm to a great
extent, is easy to reason about and the end result isn't drastically different than the original
DTCS in most cases. We give up on the "alignment" logic that exists in current implementation,
but I honestly don't think it buys us a lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks think this is a
good idea. 






  was:
Today's DTCS implementation has been discussed and debated in a few JIRAs already (the notable
one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges with
the current approach is that it is very difficult to reason about how the "Target" class makes
buckets, thus making it difficult to reason about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS properties
intact that makes it a great fit for time-series data. The simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from min and make
windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum
sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for min_threshold
windows before making a bigger one. I would argue this simplifies the algorithm to a great
extent, is easy to reason about and the end result isn't drastically different than the original
DTCS in most cases. We give up on the "alignment" logic in current class, but I honestly don't
think it buys us a lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks think this is a
good idea. 







> Proposal for simplified DTCS
> ----------------------------
>
>                 Key: CASSANDRA-11407
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Anubhav Kale
>         Attachments: 0001-Simple-DTCS.patch
>
>
> Today's DTCS implementation has been discussed and debated in a few JIRAs already (the
notable one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges
with the current approach is that it is very difficult to reason about how the "Target" class
makes buckets, thus making it difficult to reason about the expected file layout on disk.
> I am proposing a simplification to current approach that keeps most of the DTCS properties
intact that makes it a great fit for time-series data. The simplification is as follows.
> Given the min and max timestamps across all SS Tables in question, start from min and
make windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum
sized windows from min to max. 
> This keeps the DTCS properties intact except that we don't need to wait for min_threshold
windows before making a bigger one. I would argue this simplifies the algorithm to a great
extent, is easy to reason about and the end result isn't drastically different than the original
DTCS in most cases. We give up on the "alignment" logic that exists in current implementation,
but I honestly don't think it buys us a lot besides complexity.
> The implementation can obviously be optimized and cleaned up more if folks think this
is a good idea. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message