cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From philip andrew <philip14...@gmail.com>
Subject Re: Appropriate use for Cassandra?
Date Thu, 06 May 2010 04:09:48 GMT
http://www.youtube.com/watch?v=eaCCkfjPm0o
3.30 song begins
4.00 starfish loves you and Cassandra loves you!

On Thu, May 6, 2010 at 11:03 AM, Denis Haskin <denis@haskinferguson.net>wrote:

> i can haz hints pleez?
>
> On Wed, May 5, 2010 at 9:28 PM, philip andrew <philip142au@gmail.com>
> wrote:
> > Starfish loves you.
> >
> > On Wed, May 5, 2010 at 1:16 PM, David Strauss <david@fourkitchens.com>
> > wrote:
> >>
> >> On 2010-05-05 04:50, Denis Haskin wrote:
> >> > I've been reading everything I can get my hands on about Cassandra and
> >> > it sounds like a possibly very good framework for our data needs; I'm
> >> > about to take the plunge and do some prototyping, but I thought I'd
> >> > see if I can get a reality check here on whether it makes sense.
> >> >
> >> > Our schema should be fairly simple; we may only keep our original data
> >> > in Cassandra, and the rollups and analyzed results in a relational db
> >> > (although this is still open for discussion).
> >>
> >> This is what we do on some projects. This is a particularly nice
> >> strategy if the raw : aggregated ratio is really high or the raw data is
> >> bursty or highly volatile.
> >>
> >> Consider Hadoop integration for your aggregation needs.
> >>
> >> > We have fairly small records: 120-150 bytes, in maybe 18 columns.
> >> > Data is additive only; we would rarely, if ever, be deleting data.
> >>
> >> Cassandra loves you.
> >>
> >> > Our core data set will accumulate at somewhere between 14 and 27
> >> > million rows per day; we'll be starting with about a year and a half
> >> > of data (7.5 - 15 billion rows) and eventually would like to keep 5
> >> > years online (25 to 50 billion rows).  (So that's maybe 1.3TB or so
> >> > per year, data only.  Not sure about the overhead yet.)
> >> >
> >> > Ideally we'd like to also have a cluster with our complete data set,
> >> > which is maybe 38 billion rows per year (we could live with less than
> >> > 5 years of that).
> >> >
> >> > I haven't really thought through what the schema's going to be; our
> >> > primary key is an entity's ID plus a timestamp.  But there's 2 or 3
> >> > other retrieval paths we'll need to support as well.
> >>
> >> Generally, you do multiple retrieval paths through denormalization in
> >> Cassandra.
> >>
> >> > Thoughts?  Pitfalls?  Gotchas? Are we completely whacked?
> >>
> >> Does the random partitioner support what you need?
> >>
> >> --
> >> David Strauss
> >>   | david@fourkitchens.com
> >> Four Kitchens
> >>   | http://fourkitchens.com
> >>   | +1 512 454 6659 [office]
> >>   | +1 512 870 8453 [direct]
> >>
> >
> >
>
>
>
> --
> dwh
>

Mime
View raw message