apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXCORE-393) Reset failure count when consecutive failed node is removed from blacklist
Date Tue, 22 Mar 2016 23:26:25 GMT

    [ https://issues.apache.org/jira/browse/APEXCORE-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207511#comment-15207511
] 

ASF GitHub Bot commented on APEXCORE-393:
-----------------------------------------

Github user tweise commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-core/pull/274#discussion_r57087518
  
    --- Diff: engine/src/main/java/com/datatorrent/stram/StreamingAppMasterService.java ---
    @@ -781,18 +794,19 @@ private void execute() throws YarnException, IOException
          /* Remove nodes from blacklist after timeout */
           long currentTime = System.currentTimeMillis();
           List<String> blacklistRemovals = new ArrayList<String>();
    -      for (Iterator<Pair<Long, List<String>>> it = blacklistedNodesQueueWithTimeStamp.iterator();
it.hasNext();) {
    -        Pair<Long, List<String>> entry = it.next();
    -        Long timeDiff = currentTime - entry.getFirst();
    -        if (timeDiff > blacklistRemovalTime) {
    -          blacklistRemovals.addAll(entry.getSecond());
    -          it.remove();
    -        } else {
    -          break;
    +      for (String hostname : failedBlackListedNodes) {
    +        Long timeDiff = currentTime - failedContainerNodesMap.get(hostname).blackListAdditionTime;
    +        if (timeDiff >= blacklistRemovalTime) {
    +          blacklistRemovals.add(hostname);
    +          failedContainerNodesMap.remove(hostname);
             }
           }
    +
           if (!blacklistRemovals.isEmpty()) {
             amRmClient.updateBlacklist(null, blacklistRemovals);
    +        LOG.info("Removing nodes {} from blacklist: time elapsed since last blacklisting
due to failure is greater than specified timeout", blacklistRemovals.toString());
    +
    --- End diff --
    
    Minor nit, there are a few extra blank lines added...


> Reset failure count when consecutive failed node is removed from blacklist
> --------------------------------------------------------------------------
>
>                 Key: APEXCORE-393
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-393
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Isha Arkatkar
>            Assignee: Isha Arkatkar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message