incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: schema design question
Date Tue, 09 Mar 2010 13:23:00 GMT
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari <matteo.caprari@gmail.com> wrote:
> Thanks Jonathan.
>
> Correct if I'm wrong: you are suggesting that each time we receive a new
> row (item, [users]) we do 2 operations:
>
> 1) insert (or merge) this row 'as it is' (item, [users])
> 2) for each user in [users]: insert  (user, [item])
>
> Each incoming item is liked by 100 users, so it would be 100 db ops per item.
> User ids are 20b, so it's about 2k per item sent to the database.

Right.

> At about 10 items/sec, we are looking at 1k db ops/sec or 20k/sec.
>
> Can you make a gross estimate of hardware requirements?

One quad-core node can handle ~14000 inserts per second so you are in
good shape.

> We don't know when the like-ing happened: is there something like
> incremental column names?

You can use insert time, or just use a LexicalUUID.

> Or can I user item_id as column name and a null-ish placeolder as value?

Or that too.

> I share Keith concern: if we use Long as column names, won't we end up
> seeing just one user
> instead of 'all users that liked N items'?

That's true.  So you'd want to use a custom comparator where first 64
bits is the Long and the rest is the userid, for instance.

(Long + something else is common enough that we might want to add it
to the defaults...)

-Jonathan

Mime
View raw message