cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anubhav Kale <Anubhav.K...@microsoft.com>
Subject DTCS bucketing Question
Date Thu, 17 Mar 2016 17:24:33 GMT
<Not sure if this is the right alias or Dev, so asking in both places>

Hello,

I am trying to concretely understand how DTCS makes buckets and I am looking at the DateTieredCompactionStrategyTest.testGetBuckets
method and played with some of the parameters to GetBuckets method call (Cassandra 2.1.12).

I don't think I fully understand something there. Let me try to explain.

Consider the second test there. I changed the pairs a bit for easier explanation and changed
base (initial window size)=1000L and Min_Threshold=2

pairs = Lists.newArrayList(
                Pair.create("a", 200L),
                Pair.create("b", 2000L),
                Pair.create("c", 3600L),
                Pair.create("d", 3899L),
                Pair.create("e", 3900L),
                Pair.create("f", 3950L),
                Pair.create("too new", 4125L)
        );
        buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE);

In this case, the buckets should look like [0-4000] [4000-]. Is this correct ? The buckets
that I get back are different ("a" lives in its bucket and everyone else in another). What
I am missing here ?

Another case,

pairs = Lists.newArrayList(
                Pair.create("a", 200L),
                Pair.create("b", 2000L),
                Pair.create("c", 3600L),
                Pair.create("d", 3899L),
                Pair.create("e", 3900L),
                Pair.create("f", 3950L),
                Pair.create("too new", 4125L)
        );
        buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE);

Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. Is this correct ? Again,
the buckets that come back are quite different.

Note, that if I keep the base to original (100L) or increase it and play with min_threshold
the results are exactly what I would expect.

The way I think about DTCS is, try to make buckets of maximum possible sizes from 0, and once
you can't make do that , make smaller buckets (similar to what the comment suggests). Is this
mental model wrong ? I am afraid that the math in Target class is somewhat hard to follow
so I am thinking about it this way.

Thanks a lot in advance.

-Anubhav

Mime
View raw message