incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McNelis <dmcne...@agentisenergy.com>
Subject Re: A good key for data distribution over nodes
Date Mon, 10 Oct 2011 14:16:32 GMT
You should be ok, depending on the partitioner strategy you use.  The keys
end up created as a hash (which is why when you're setting up your nodes you
can give them a specific key.  Then, whatever your key is will be used to
create an MD5 hash, that hash will then determine what node your data will
live on.

So while your distribution won't necessarily be completely balanced, it
should at least be in the right ballpark.

To give you an idea of this in practice, we've got consecutive integer
values as our keys and we're using the random partitioner...we have VERY
close to the same number of keys on each of our nodes.  Then the bigger
question about balancing your load is how big each record is...if they are
consistent in size, vary widely, ect, as that is just as likely to impact
how balanced your loads are.

On Mon, Oct 10, 2011 at 9:09 AM, Laurent Aufrechter <
laurent.aufrechter@yahoo.fr> wrote:

> Hi,
>
> I am planing to make tests on Cassandra with a few nodes. I want to create
> a column family where the key will be the date down to the second (like
> 2011/10/10-16:07:53). Doing so, my keys will be very similar from each
> others. Is it ok to use such keys if I want my data to be evenly distributed
> across my nodes or do I have to "do something" ?
>
> Thanks in advance.
>
> L. Aufrechter
>



-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*

Mime
View raw message