incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Cassandra range scans
Date Mon, 22 Feb 2010 06:19:06 GMT
[replying to list, with permission]

On Mon, Feb 22, 2010 at 12:05 AM,  <> wrote:
> I'm looking for a very scalable primary data store for a large web/API application. Our
data consists largely of lists of things, per user. So a user has a bunch (dozens to hundreds)
of thing A, some of thing B, a few of thing C, etc. There are social elements to the app w/
shared data, so that could be modeled with each user having a list of pointers, but with writes
being super cheap I'm more inclined to write everything everywhere (that's a side issue, but
it's in the back of my mind). Users number in the millions.
> So basically I'm looking for something scalable, available, fast, and with native support
for range scans (given that almost every query is fetching some list of things). This is where
my questions lie... I'm pretty familiar with the Bigtable model and it suits my needs quite
well, I would store thing A under a row key of "userid.thingid" (or similar) and then a range
scan over "userid." will pick them all up at once.
> HBase has been top of my list in terms of data model, but I ran across a performance
study which suggested it's questionable and the complexity of components gives me some pause.
So Cassandra seems the other obvious choice. However, the data model isn't as clear to me
(at least, not yet, which is probably just a terminology problem).
> My questions:
>  1) would you consider Cassandra (0.5+) "safe enough" for a primary data store?

Yes.  Several companies are deploying 0.5 in production.  It's pretty
solid.  (We'll have a 0.5.1 fixing some minor issues RSN, and a 0.6
beta.)  And I agree that it's significantly simpler to deploy (and
keep running) than HBase.

>  2) is the row key model I suggested above the best approach in Cassandra, or is there
something better? My testing so far has been using get_range_slice with a ColumnParent of
just the CF and SlicePredicate listing the columns I want (though really I want all columns,
is there a shorthand for that?)

Cassandra deals fine with millions of columns per row, and allows
prefix queries on columns too.  So an alternate model would be to have
userX as row key, and column keys "A:1, A:2, A:3, ..., B:1, B:2, B:3,
...".  This will be marginally faster than splitting by row, and has
the added advantage of not requiring OPP.

You could use supercolumns here too (where the supercolumn name is the
thing type).  If you always want to retrieve all things of type A at a
time per user, then that is a more natural fit.  (Otherwise, the lack
of subcolumn indexing could be a performance gotcha for you:

>  3) schema changes (i.e. adding a new CF)... seems like currently you take the whole
cluster down to accomplish this... is that likely to change in the future?

You have to take each node down, but a rolling restart is fine.  No
reason for the whole cluster to be down at once.

We're planning to make CF changes doable against live nodes for 0.7,

>  4) any tuning suggestions for this kind of setup? (primary data store using OrderPreservingPartitioner
doing lots of range scans, etc.)

Nothing unusual -- just the typical "try to have enough RAM to cache
your 'hot' data set."

>  5) I noticed mention in some discussion that the OrderPreserving mode is not as well
utilized and is probably in need of optimizations... how serious is that, and are there people
working on that, or is help needed?

We have range queries in our stress testing tool now, and with Hadoop
integration coming in 0.6 I expect it will get a lot more testing.
Certainly anyone who wants to get their hands dirty is welcome. :)

>  6) hardware... we could certainly choose to go with pretty beefy hardware, especially
in terms of RAM... is there a point where it just isn't useful?

Some recommdendations in  In general, don't
go beyond the "knee" of the price/performance curve, since you can
always add more nodes instead.

Past "enough for your memtables"
(, RAM is only useful
for caching reads, it won't help write performance.  So that's the
main factor in "how much do I need."


View raw message