incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Andrew Rogers" <>
Subject Re: Order preserving partitioning strategy
Date Thu, 26 Aug 2010 04:25:34 GMT
Hi Jonathan,

I've never seen a paper that discusses it as a primary topic, it is
always in some other context. IIRC, the most recent discussions of it
I have seen have been in join algorithm literature from somewhere in
Asia. MPP analytical databases often implement some form of skew
adaptivity but there is no standard way because the design tradeoffs
are context dependent. DB2 also has a non-adaptive technique for
dealing with skew that should be simple to implement on Cassandra and
might provide an 80/20 option (more on that a little further on).

Skew adaptivity is generally implemented with a mix of data structures
along the lines of an adaptive quad-tree. The reason you only see this
in analytical databases is that the data skew is unlikely to change
much and/or have too much concurrent updating. If the distribution
radically changes all of the time under high concurrency, it will
create some mix of resource contention, lost selectivity, or runaway
space consumption depending on implementation detail. The optimal mix
of pain tends to be a compile-time option, so it isn't very flexible.
Definitely not optimal for concurrent OLTP-ish workloads.


IBM's DB2 has a couple different data organization options that
essentially define partitionable skew invariants. The closer the real
data distribution is to the skew invariant, the more access
performance is like a distributed hash table. For many data models,
the data distribution skew can be roughly predicted ahead of time and
for those applications it is relatively efficient. You can take this
pretty far; DB2 has libraries of skew invariants based on irregular
Voronoi tesselation and I believe the most recent version of SQL
Server has something similar. I think they even have tools for
sampling representative data for the purposes of finding a good skew

It is not much but I hope that helps.

Data distribution skew handling only partly mitigates data access skew
issues (e.g. temporal data) but it is better than nothing.

J. Andrew Rogers

On Tue, Aug 24, 2010 at 10:30 AM, Jonathan Ellis <> wrote:
> What are some good papers to read for background?
> On Tue, Aug 24, 2010 at 12:26 PM, J. Andrew Rogers
> <> wrote:
>> On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong <> wrote:
>>> OrderPreservingPartitioner is efficient range queries but can cause
>>> unevently distributed data. Does anyone has an idea of a
>>> HybridPartitioner which takes advantages of both RandomPartitioner
>>> and OPP, or at least a partitioner trade off between them.
>> What you are looking for is skew adaptive partitioning i.e. like a
>> B+Tree except distributable.
>> A couple different methods for doing something like this exist, but
>> you rarely see them and they have their own (different) tradeoffs. To
>> the best of my knowledge, implementation requires a fairly deep
>> architectural commitment; it is more involved than simply defining a
>> partitioning function and the "adaptive" aspect must be distribution
>> friendly. It is an active area of research in the literature with no
>> obvious and simple solutions that can be lashed onto a database engine
>> "as is".
>> --
>> J. Andrew Rogers
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support

View raw message