incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Robson <mar...@gmail.com>
Subject Re: Cassandra data distribution and configuration settings
Date Wed, 18 Nov 2009 07:52:44 GMT
2009/11/17 Richard Grossman <richiesgr@gmail.com>

> Ho do I evaluate the value I need to put here ??
> The second point is that I've many column family each with a different key
> then how do I know what is the token to distribute the data ??
>

It's not automatic at the moment.

If you leave it to make its own token, it'll make a token randomly in the
character range it uses (I think 0-9a-zA-Z ). This is not ideal if you're
using (say lowercase) hex keys.

The only solution for now is to specify your own tokens.

For 0.5 it seems likely that adding new nodes will automatically load
balance, and auto-bootstrap, so the best strategy would be to start with
just one or two nodes, then load a small sample of data before bootstrapping
the remaining ones.

If you know your keys will (start with or) be a hex number, then just set
the tokens to 0,4,8,c (if you have 4 nodes, for example). Or anything
really, as long as they're evenly distributed.

Choosing keys correctly is important for the ordered partitioner. You
presumably want to be able to do range scans (or you'd use
RandomPartitioner), but you also want your data to be spread out.

What I've got planned is to add a small hex hash value of the customer id to
the beginning of the key (which I don't need to range scan), then add the
rest of the key (which I do need to range scan). That means I can still
range scan (e.g.) within 1 customer's data, but the customers will be spread
out more evenly between the nodes.

Mark

Mime
View raw message