incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Haskin <>
Subject Appropriate use for Cassandra?
Date Wed, 05 May 2010 04:50:31 GMT
I've been reading everything I can get my hands on about Cassandra and
it sounds like a possibly very good framework for our data needs; I'm
about to take the plunge and do some prototyping, but I thought I'd
see if I can get a reality check here on whether it makes sense.

Our schema should be fairly simple; we may only keep our original data
in Cassandra, and the rollups and analyzed results in a relational db
(although this is still open for discussion).

We have fairly small records: 120-150 bytes, in maybe 18 columns.
Data is additive only; we would rarely, if ever, be deleting data.

Our core data set will accumulate at somewhere between 14 and 27
million rows per day; we'll be starting with about a year and a half
of data (7.5 - 15 billion rows) and eventually would like to keep 5
years online (25 to 50 billion rows).  (So that's maybe 1.3TB or so
per year, data only.  Not sure about the overhead yet.)

Ideally we'd like to also have a cluster with our complete data set,
which is maybe 38 billion rows per year (we could live with less than
5 years of that).

I haven't really thought through what the schema's going to be; our
primary key is an entity's ID plus a timestamp.  But there's 2 or 3
other retrieval paths we'll need to support as well.

Thoughts?  Pitfalls?  Gotchas? Are we completely whacked?



View raw message