On 11/01/2013 09:15 PM, Robert Coli
I think you already guessed the answer :) It is a production
cluster, we needed some features (particularly, compare and set)
only present in 2.0 because of the applications. Besides, somebody
had to discover the regression, right? :) Thanks for the link.
We are in the middle of the migration from 1.2.9 to 2.0 when we are
also upgrading our application which can only be run against 2.0
due to various technical details. It is rather hard to explain, but
we hoped it will last just for few days and it is definitely not the
status we wanted to keep. Since we hit the bug, we got stalled in
the middle of the migration.
Hmm, thinking about it a bit more, I am unsure this will actually
help. If I understand things correctly, assuming uniform
distribution of new received keys in L0 (ensured by
RandomPartitioner), in order for LCS to work optimally, I need:
a) get uniform distribution of keys across sstables in one level,
i.e. in every level each sstable will cover more or less the same
range of keys
b) sstables in each level should cover almost whole space of keys
the node is responsible for
c) propagate sstables to higher levels in uniform fashion, e.g.
round-robin or random (over time, the probability of choosing an
sstables as candidate should be the same for all sstables in the
By splitting the sorted Big SStable, I will get a bunch of
non-overlapping sstables. So I will surely achieve a). Point c) is
fixed by the patch. But what about b)? It probably depends on order
of compaction across levels, i.e. whether the compactions in various
levels are being run in parallel and interleaved or not. In case it
compacts all the tables from one level and only after that starts to
compact sstables in higher level etc, one will end up in very
similar situation as caused by the referenced bug (because of round
robin fashion of choosing candidates), i.e. having the biggest keys
in L1 and smallest keys in the highest level. So in this case, it
would actually not help at all.
Does it make sense or am I completely wrong? :)
BTW: Not very though-out idea, but wouldn't it actually be better to
select candidates completely randomly?