helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinoth Chandar <vin...@uber.com>
Subject Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer
Date Fri, 25 Mar 2016 20:24:30 GMT
Okay thanks for the lead. Will try this and reporr back

On Friday, March 25, 2016, kishore g <g.kishore@gmail.com> wrote:

> so computeOrphans is the one thats causing the behavior.
>
> In the beginning when nothing is assigned, all replicas are considered as
> orphans. Once they are considered as Orphan, they get assigned to any
> random node (this overrides everything thats computed by the placement
> scheme)
>
> I think the logic in computeOrphaned is broken, a replica should be
> treated as Orphan if the preferred node is not part of live node list.
>
> Try this in computeOrphaned. Note, the test case might fail because of
> this change and you will might have to change that according to new
> behavior. I think it will be good to introduce this behavior based on
> cluster config parameter.
>
>  private Set<Replica> computeOrphaned() {
>     Set<Replica> orphanedPartitions = new TreeSet<Replica>();
>     for(Entry<Replica, Node> entry:_preferredAssignment.entrySet()){
>       if(!_liveNodesList.contains(entry.getValue())){
>         orphanedPartitions.add(entry.getKey());
>       }
>     }
>     for (Replica r : _existingPreferredAssignment.keySet()) {
>       if (orphanedPartitions.contains(r)) {
>         orphanedPartitions.remove(r);
>       }
>     }
>     for (Replica r : _existingNonPreferredAssignment.keySet()) {
>       if (orphanedPartitions.contains(r)) {
>         orphanedPartitions.remove(r);
>       }
>     }
>
>     return orphanedPartitions;
>   }
>
> On Fri, Mar 25, 2016 at 8:41 AM, Vinoth Chandar <vinoth@uber.com
> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>
>> Here you go
>>
>> https://gist.github.com/vinothchandar/18feedfa84650e3efdc0
>>
>>
>> On Fri, Mar 25, 2016 at 8:32 AM, kishore g <g.kishore@gmail.com
>> <javascript:_e(%7B%7D,'cvml','g.kishore@gmail.com');>> wrote:
>>
>>> Can you point me to your code. fork/patch?
>>>
>>> On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vinoth@uber.com
>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> Printed out more information and trimmed the test down to 1 resource
>>>> with 2 partitions, and I bring up 8 servers in parallel.
>>>>
>>>> Below is the paste of my logging output + annotations.
>>>>
>>>> >>> Computing partition assignment
>>>> >>>> NodeShift for countLog-2a 0 is 5, index 5
>>>> >>>> NodeShift for countLog-2a 1 is 5, index 6
>>>>
>>>> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of
0,
>>>> 1
>>>>
>>>> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
>>>> name=localhost-server-6
>>>> preferred:0
>>>> nonpreferred:0, countLog-2a_1|0=##########
>>>> name=localhost-server-7
>>>> preferred:0
>>>> nonpreferred:0}
>>>>
>>>> VC: This translates to server-6/server-7 (since I named them starting 1)
>>>>
>>>> >>>>  Existing Preferred Assignment: {}
>>>> >>>>  Existing Non Preferred Assignment: {}
>>>> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>>>> >>> Final State Map :{0=ONLINE}
>>>> >>>> Final ZK record : countLog-2a,
>>>> {}{countLog-2a_0={localhost-server-1=ONLINE},
>>>> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
>>>> countLog-2a_1=[localhost-server-1]}
>>>>
>>>> VC: But the final effect still seems to be assigning the partitions to
>>>> servers 1 & 2 (first two).
>>>>
>>>> Any ideas on where to start poking?
>>>>
>>>>
>>>> Thanks
>>>> Vinoth
>>>>
>>>> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vinoth@uber.com
>>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>>
>>>>> Hi Kishore,
>>>>>
>>>>> I think the changes I made are exercised when computing the preferred
>>>>> assignment, later when the reconciliation happens with existing
>>>>> assignment/orphaned partitions etc, I think it does not take effect.
>>>>>
>>>>> The effective assignment I saw was all partitions (2 per resource)
>>>>> were assigned to first 2 servers. I started to dig into the above mentioned
>>>>> parts of the code, will report back tmrw when I pick this back up.
>>>>>
>>>>> Thanks,
>>>>> Vinoth
>>>>>
>>>>> _____________________________
>>>>> From: kishore g <g.kishore@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','g.kishore@gmail.com');>>
>>>>> Sent: Tuesday, March 15, 2016 2:01 PM
>>>>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>>>>> rebalancer
>>>>> To: <user@helix.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@helix.apache.org');>>
>>>>>
>>>>>
>>>>>
>>>>> 1) I am guessing it gets overriden by other logic in
>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>>
>>>>> What is the logic you are referring to?
>>>>>
>>>>> Can you print the assignment count for your use case?
>>>>>
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>>
>>>>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vinoth@uber.com
>>>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> We are hitting a fairly known issue where we have 100s of resource
>>>>>> with < 8 resources spreading across 10 servers and the built-in
assignment
>>>>>> always assigns partitions from first to last, resulting in heavy
skew for a
>>>>>> few nodes.
>>>>>>
>>>>>> Chatted with Kishore offline and made a patch as here
>>>>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>>>>> with 5 resources with 2 partitions each across 8 servers, logging
out the
>>>>>> nodeShift & ultimate index picked does indicate that we choose
servers
>>>>>> other than the first two, which is good
>>>>>>
>>>>>> But
>>>>>> 1) I am guessing it gets overriden by other logic in
>>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>>>>> needs to ironed out.
>>>>>>
>>>>>> I will keep chipping at this.. Any feedback appreciated
>>>>>>
>>>>>> Thanks
>>>>>> Vinoth
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message