On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams <driftx@gmail.com> wrote:
On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad <erikholstad@gmail.com> wrote:
Have a couple of questions about the best way to use Cassandra.
Using the random partitioner + the multi_get calls vs order preservation + range_slice calls?

When you use an OPP, the distribution of your keys becomes your problem.  If you don't have an even distribution, this will be reflected in the load on the nodes, while the RP gives you even distribution.
Yeah, that is why it would be nice to hear if anyone has compared the performance between the two,
to see if it is worth worrying about your own distribution. I also read that the random partitioner doesn't
give that great distribution.

What is the benefit of using multiple families vs super column?

http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I prefer simple CFs instead of supercolumns.
Yeah, this is nasty.
For example in the case of sorting
in different orders. One good thing that I can see here when using super column is that you don't
have to restart your cluster every time you want to add something new order.

A supercolumn can still only compare subcolumns in a single way.
Yeah, I know that, but you can have a super column per sort order without having to restart the cluster.

When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you will be able to add CFs without restarting.
Looks interesting, but targeted at 0.7, so it is probably going to be a little while, or?


Regards Erik