cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Deployment on AWS and replication strategies
Date Sun, 04 Apr 2010 08:18:09 GMT
On Sat, Apr 3, 2010 at 8:23 PM, Mike Gallamore
<mike.e.gallamore@googlemail.com> wrote:
>>
> I didn't mean a real time determination, more of if the nodes aren't identical. For example
if you have a cluster made up of a bunch of EC2 light instances and decide to add a large
instance, it would be nice if the new node would get a proportional amount of work based on
what its system specs are.

Sure, set the token(s) appropriately.

>>
>>> perhaps a preferred hash range not just a token (and presumably everything else
would automatically rebalance itself to make that happen)
>>>
>>
>> Unclear what this would do.
> Well rather than getting half of the most busy nodes work (which is how I understand
it works now) you'd get an amount of work that is proportional to the power of the node.

Assuming you allow it to automatically assign its own token, the new
node will get have the range of the node with the most data, not the
most 'busy'.  The amount of work being done by the nodes is not a
consideration, nor would you want automatic selection of that within
cassandra except with significant support for long term trend
collection and analysis, pluggable policies for calculating 'load',
etc.

>>
>> Or just set the token specifically for each node you bootstrap.
>> Starting a node and crossing your fingers on its token selection is a
>> recipe for interesting times :)
> Can you specify a token based on a real key value? How do you know what token to use
to make sure that locally relevant data gets at least one copy stored locally?

Again, placement strategy is what you want to investigate.

> My understanding is rackawarestrategy puts the data on the next node in the token ring
that is in a different datacenter. The problem is if you want a specific "other datacenter"
not just the next one in the list.

Right, I suggested looking at the source as an example.  If you want a
more sophisticated placement policy, write one.  They are not
complicated and you will have a much deeper understanding of the
mechanism.  IMO, pluggable placement is a remarkable feature.


b

Mime
View raw message