On Feb 22, 2010, at 12:19 AM, ext Jonathan Ellis wrote:
>> 2) is the row key model I suggested above the best approach in Cassandra, or is
there something better? My testing so far has been using get_range_slice with a ColumnParent
of just the CF and SlicePredicate listing the columns I want (though really I want all columns,
is there a shorthand for that?)
>
> Cassandra deals fine with millions of columns per row, and allows
> prefix queries on columns too. So an alternate model would be to have
> userX as row key, and column keys "A:1, A:2, A:3, ..., B:1, B:2, B:3,
> ...". This will be marginally faster than splitting by row, and has
> the added advantage of not requiring OPP.
>
> You could use supercolumns here too (where the supercolumn name is the
> thing type). If you always want to retrieve all things of type A at a
> time per user, then that is a more natural fit. (Otherwise, the lack
> of subcolumn indexing could be a performance gotcha for you:
> http://issues.apache.org/jira/browse/CASSANDRA-598).
Would you say the supercolumn approach is faster than scanning rows? Any particular advantages
or disadvantages to writing to a bunch of supercolumns at once (e.g. in one user row), vs.
writing to a bunch of rows at once (with the same key prefix, i.e. close together in an order-preserved
store)?
>
>> 3) schema changes (i.e. adding a new CF)... seems like currently you take the whole
cluster down to accomplish this... is that likely to change in the future?
>
> You have to take each node down, but a rolling restart is fine. No
> reason for the whole cluster to be down at once.
OK, that's not a big deal.
Extremely helpful... thanks for the response!
Jeremey.
|