cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Shi <>
Subject Re: Why SSTable is sorted by tokens instead of row keys?
Date Sat, 05 Nov 2011 00:45:07 GMT
Thank you.

I agree that request "lots of" machines process a single query could be
slow, if there are hundreds of them instead of dozens. Will a cluster of
e.g. 4-20 nodes behave well if we spread the query to all nodes?

Many articles suggest model TimeUUID in columns instead of rows, but since
only one node can serve a single row, won't this lead to hot spot problems?
在 2011-11-4 晚上10:28,"Sylvain Lebresne" <>写道:

> On Fri, Nov 4, 2011 at 1:49 PM, Gary Shi <> wrote:
> > I want to save time series event logs into Cassandra, and I need to load
> > them by key range (row key is time-based). But we can't use
> > RandomPartitioner in this way, while OrderPreservingPartitioner leads to
> hot
> > spot problem.
> >
> > So I wonder why Cassandra save SSTable by sorted row tokens instead of
> keys:
> > if rows in SSTable are sorted by keys, it should be quite easy to return
> > rows by key range -- token should be used to determine which node
> contains
> > the data. For key range requests, Cassandra could ask every node for that
> > range of rows, merge them and return to the caller.
> Without going for exhaustiveness:
> - Requesting every node is not too scalable. Cassandra is built to target
> the
> 'lots of cheap machines' kind of cluster, so that kind of operation is
> going the
> exact opposite way. In other words, that would be slow enough that you're
> better off modeling this using columns for time series.
> - That would make topology operations (bootstrap, move, decommission)
> much  more costly, because we wouldn't be able to tell which keys to move
> unless we iterate over all the data each time.
> --
> Sylvain
> >
> > --
> > regards,
> > Gary Shi
> >

View raw message