helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Widmer <Tom.Wid...@camcog.com>
Subject Auto-rebalancing question
Date Thu, 06 Nov 2014 13:56:00 GMT
Hi all,

Firstly, thanks for open-sourcing this useful and powerful framework!

Secondly, I have a question about full auto-rebalancing. I have some resources that I use
with the OnlineOffine state but which only have a single partition at the moment (essentially,
Helix is just giving me a simple leader election to decide who controls the resource - I don’t
care which participant has it, as long as only one does). However, with full auto rebalance,
I find that the ‘first’ instance (alphabetically I think) always gets the resource when
it’s up. So if I take down the first node so the partition transfers to the 2nd node, then
bring back up the 1st node, the resource transfers back unnecessarily.

Note that this issue also affects multi-partition resources, it’s just a bit less noticeable
(it means that with 3 nodes and 4 partitions, say, the partitions are always allocated 2,
1, 1, so if you have node 1 down and hence 0, 2, 2, and then bring up node 1, it unnecessarily
moves 2 partitions to make 2, 1, 1 rather than the minimum move to achieve ‘balance’ which
would be to move 1 partition from instance 2 or 3 back to instance 1.

I can see the code in question in AutoRebalanceStrategy.typedComputePartitionAssignment, where
the distRemainder is allocated to the first nodes, so that the capacity of all nodes is not
equal. Is there any reason why the capacity shouldn’t be rounded up, so that in the 1 partition
case, all live instances have a capacity of 1, and in the 3 instances 4 partitions case above,
all instances have a capacity of 2? This then allows the rebalance algorithm to move less
around when instances come up or go down, without losing ‘balance’.

For my immediate problem, is there a better way to hold leadership elections for non-partitioned
resources than to have a resource with a partition count of 1? Or should I perhaps use a single
partitioned resource for all non-partitioned resources and assign them each a partition number?

Thanks,

Tom
This email and any attachments are intended only for the addressees and may contain confidential
and/or privileged material. Any processing of, or taking of any action in reliance upon, this
information by persons or entities other than the intended addressees is prohibited. If you
have received this in error, do not take a copy to your computer or removable media, or forward
this email. Please contact the sender and delete this material. Cambridge Cognition has monitoring
and scanning systems in place in relation to emails sent and received to: monitor / record
business communications in order to prevent and detect crime; investigate the use of the Company's
internal and external email system; and provide evidence of compliance with business practices.
Company Registration Number 4338746 Registered address, Tunbridge Court, Tunbridge Lane, Bottisham,
Cambridge, CB25 9TU, UK

Mime
View raw message