tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-1567) Avoid blacklisting nodes when the disable blacklisting threshold is about to be hit
Date Wed, 22 Oct 2014 17:15:33 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180190#comment-14180190

Bikas Saha commented on TEZ-1567:

I understand that and I feel that logic is getting more convoluted (given that the existing
logic is already convoluted because the ignoreBlacklisting is computed as an after effect).
My suggestion is to make this simpler (including the existing code) by doing the following
1) the num bad machines threshold is something that can be calculated a priori (whenever num
cluster nodes is updated). Its the same computeIgnoreBlacklisting() code. There is no need
to compute this after the machine is blacklisted. Its just a threshold calculation.
2) node.shouldBlacklistNode() can additionally check if adding 1 to the current blacklisted
machine count will exceed the number calculated in 1)
3) To maintain existing behavior of unblacklisting all nodes when the threshold is hit, if
2) fails to blacklist then the node can send an event to AMNodeImpl that triggers sendIngoreBlacklistingStateToNodes()
or executes the sendIngoreBlacklistingStateToNodes() code inside AMNodeImpl itself.

> Avoid blacklisting nodes when the disable blacklisting threshold is about to be hit
> -----------------------------------------------------------------------------------
>                 Key: TEZ-1567
>                 URL: https://issues.apache.org/jira/browse/TEZ-1567
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1567.1.txt

This message was sent by Atlassian JIRA

View raw message