incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Gallamore <mike.e.gallam...@googlemail.com>
Subject Re: Deployment on AWS and replication strategies
Date Sun, 04 Apr 2010 16:14:15 GMT
Pluggable placement: that is cool. It wasn't something that was obvious to me that was available
from the documentation I read.  I thought maybe the the rackaware and rackunaware were hard
coded in somewhere. I'm not a java developer so I haven't looked at the code much. That said
I'll take a look and see if I can figure out how it works. I have coded in C/C++ so I probably
can handle the logic part of Java code okay.
On 2010-04-04, at 1:18 AM, Benjamin Black wrote:

> On Sat, Apr 3, 2010 at 8:23 PM, Mike Gallamore
> <mike.e.gallamore@googlemail.com> wrote:
>>> 
>> I didn't mean a real time determination, more of if the nodes aren't identical. For
example if you have a cluster made up of a bunch of EC2 light instances and decide to add
a large instance, it would be nice if the new node would get a proportional amount of work
based on what its system specs are.
> 
> Sure, set the token(s) appropriately.
> 
>>> 
>>>> perhaps a preferred hash range not just a token (and presumably everything
else would automatically rebalance itself to make that happen)
>>>> 
>>> 
>>> Unclear what this would do.
>> Well rather than getting half of the most busy nodes work (which is how I understand
it works now) you'd get an amount of work that is proportional to the power of the node.
> 
> Assuming you allow it to automatically assign its own token, the new
> node will get have the range of the node with the most data, not the
> most 'busy'.  The amount of work being done by the nodes is not a
> consideration, nor would you want automatic selection of that within
> cassandra except with significant support for long term trend
> collection and analysis, pluggable policies for calculating 'load',
> etc.
> 
>>> 
>>> Or just set the token specifically for each node you bootstrap.
>>> Starting a node and crossing your fingers on its token selection is a
>>> recipe for interesting times :)
>> Can you specify a token based on a real key value? How do you know what token to
use to make sure that locally relevant data gets at least one copy stored locally?
> 
> Again, placement strategy is what you want to investigate.
> 
>> My understanding is rackawarestrategy puts the data on the next node in the token
ring that is in a different datacenter. The problem is if you want a specific "other datacenter"
not just the next one in the list.
> 
> Right, I suggested looking at the source as an example.  If you want a
> more sophisticated placement policy, write one.  They are not
> complicated and you will have a much deeper understanding of the
> mechanism.  IMO, pluggable placement is a remarkable feature.
> 
> 
> b


Mime
View raw message