cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: two dimensional slicing
Date Mon, 30 Jan 2012 18:34:16 GMT
(not trolling) but do you have any ideas on how ? 

The token produced by the partitioner is used as the key in the distributed hash table so
we can map keys to nodes, and evenly distribute load.  If the range of tokens for the DHT
are infinite it's difficult to evenly map them to a finite set of nodes. 


If you know that the number of DHT keys (and so row keys) are finite then it is easier to
use the BOP. 

Or if you know that the row keys are something like a time series you could use the sort of
approach used with Horizontal Partitioning in a RDBMS and run a sliding window of nodes. Every
month drop the oldest partition / node off the end and add a new one for the next month. 

Just some thoughts.

Aaron Morton
Freelance Developer

On 30/01/2012, at 7:19 PM, Terje Marthinussen wrote:

> On Sun, Jan 29, 2012 at 7:26 PM, aaron morton <> wrote:
>> and compare them, but at this point I need to focus on one to get
>> things working, so I'm trying to make a best initial guess.
> I would go for RP then, BOP may look like less work to start with but it *will* bite
you later. If you use an increasing version number as a key you will get a hot spot. Get it
working with RP and Standard CF's, accept the extra lookups, and then see if where you are
performance / complexity wise. Cassandra can be pretty fast.
> Of course, there is no guarantee that it will bite you.
> Whatever data hotspot you may get may very well be minor vs. the advantage of slicing
continous blocks of data on a single server vs. random bits and pieces all over the place.
> For instance, there are many large data repositories out there of analytic data which
only have a few queries per hour. BOP will most likely have no performance at all for many
of these, indeed, it may be much faster than the alternatives.
> BOP is very useful and powerful for many things and saves a fair chunk of development
time vs. the alternatives when you can use it.
> If we really want everybody to stop using it, we should change cassandra so it by default
can provide the same function in some other way without adding days and maybe weeks of development
and extra complexity to your project.
> Terje

View raw message