incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Di Pentima <lu...@di-pentima.com.ar>
Subject Re: when i use the OrderPreservingPartition, the load is very imbalance
Date Mon, 26 Apr 2010 11:49:30 GMT
Hello Mark,

El 26/04/2010, a las 07:17, Mark Robson escribió:

> I think the solution to this would be to choose your nodes' tokens wisely before you
start inserting data, and if possible, modify the keys to split them better between the nodes.
> 
> For example, if your key has two parts, one of which you want to range scan, another
which you don't. Say you have customer_id and a timestamp. The customer ID does not need to
be range scanned, so you can hash it into a hex value (say), then append the timestamp (in
a lexically sortable way of course). So you'd end up with keys like 
> 
> HHHH-0012345-0001234567890
> 
> Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the rest is
a timestamp.
> 
> You'd be able to do a time range scan by using the known prefixes, and distributing your
nodes equally from 0000 to ffff would result in fairly even data (provided you don't have
a very small number of very large customers).


How do you ask cassandra to do a range scan with a prefix? As far as I can tell, you can't
do something like:

db.get_range('SomeCF', :start => 'HHHH-0012345-*')

...do you?


Regards
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: lucas@di-pentima.com.ar
MSN: ldipenti75@hotmail.com


Mime
View raw message