helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer
Date Fri, 25 Mar 2016 19:20:34 GMT
so computeOrphans is the one thats causing the behavior.

In the beginning when nothing is assigned, all replicas are considered as
orphans. Once they are considered as Orphan, they get assigned to any
random node (this overrides everything thats computed by the placement
scheme)

I think the logic in computeOrphaned is broken, a replica should be treated
as Orphan if the preferred node is not part of live node list.

Try this in computeOrphaned. Note, the test case might fail because of this
change and you will might have to change that according to new behavior. I
think it will be good to introduce this behavior based on cluster config
parameter.

 private Set<Replica> computeOrphaned() {
    Set<Replica> orphanedPartitions = new TreeSet<Replica>();
    for(Entry<Replica, Node> entry:_preferredAssignment.entrySet()){
      if(!_liveNodesList.contains(entry.getValue())){
        orphanedPartitions.add(entry.getKey());
      }
    }
    for (Replica r : _existingPreferredAssignment.keySet()) {
      if (orphanedPartitions.contains(r)) {
        orphanedPartitions.remove(r);
      }
    }
    for (Replica r : _existingNonPreferredAssignment.keySet()) {
      if (orphanedPartitions.contains(r)) {
        orphanedPartitions.remove(r);
      }
    }

    return orphanedPartitions;
  }

On Fri, Mar 25, 2016 at 8:41 AM, Vinoth Chandar <vinoth@uber.com> wrote:

> Here you go
>
> https://gist.github.com/vinothchandar/18feedfa84650e3efdc0
>
>
> On Fri, Mar 25, 2016 at 8:32 AM, kishore g <g.kishore@gmail.com> wrote:
>
>> Can you point me to your code. fork/patch?
>>
>> On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vinoth@uber.com> wrote:
>>
>>> Hi Kishore,
>>>
>>> Printed out more information and trimmed the test down to 1 resource
>>> with 2 partitions, and I bring up 8 servers in parallel.
>>>
>>> Below is the paste of my logging output + annotations.
>>>
>>> >>> Computing partition assignment
>>> >>>> NodeShift for countLog-2a 0 is 5, index 5
>>> >>>> NodeShift for countLog-2a 1 is 5, index 6
>>>
>>> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0, 1
>>>
>>> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
>>> name=localhost-server-6
>>> preferred:0
>>> nonpreferred:0, countLog-2a_1|0=##########
>>> name=localhost-server-7
>>> preferred:0
>>> nonpreferred:0}
>>>
>>> VC: This translates to server-6/server-7 (since I named them starting 1)
>>>
>>> >>>>  Existing Preferred Assignment: {}
>>> >>>>  Existing Non Preferred Assignment: {}
>>> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>>> >>> Final State Map :{0=ONLINE}
>>> >>>> Final ZK record : countLog-2a,
>>> {}{countLog-2a_0={localhost-server-1=ONLINE},
>>> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
>>> countLog-2a_1=[localhost-server-1]}
>>>
>>> VC: But the final effect still seems to be assigning the partitions to
>>> servers 1 & 2 (first two).
>>>
>>> Any ideas on where to start poking?
>>>
>>>
>>> Thanks
>>> Vinoth
>>>
>>> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vinoth@uber.com> wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> I think the changes I made are exercised when computing the preferred
>>>> assignment, later when the reconciliation happens with existing
>>>> assignment/orphaned partitions etc, I think it does not take effect.
>>>>
>>>> The effective assignment I saw was all partitions (2 per resource) were
>>>> assigned to first 2 servers. I started to dig into the above mentioned
>>>> parts of the code, will report back tmrw when I pick this back up.
>>>>
>>>> Thanks,
>>>> Vinoth
>>>>
>>>> _____________________________
>>>> From: kishore g <g.kishore@gmail.com>
>>>> Sent: Tuesday, March 15, 2016 2:01 PM
>>>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>>>> rebalancer
>>>> To: <user@helix.apache.org>
>>>>
>>>>
>>>>
>>>> 1) I am guessing it gets overriden by other logic in
>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>
>>>> What is the logic you are referring to?
>>>>
>>>> Can you print the assignment count for your use case?
>>>>
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vinoth@uber.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> We are hitting a fairly known issue where we have 100s of resource
>>>>> with < 8 resources spreading across 10 servers and the built-in assignment
>>>>> always assigns partitions from first to last, resulting in heavy skew
for a
>>>>> few nodes.
>>>>>
>>>>> Chatted with Kishore offline and made a patch as here
>>>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>>>> with 5 resources with 2 partitions each across 8 servers, logging out
the
>>>>> nodeShift & ultimate index picked does indicate that we choose servers
>>>>> other than the first two, which is good
>>>>>
>>>>> But
>>>>> 1) I am guessing it gets overriden by other logic in
>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>>>> needs to ironed out.
>>>>>
>>>>> I will keep chipping at this.. Any feedback appreciated
>>>>>
>>>>> Thanks
>>>>> Vinoth
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message