hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Boudnik (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6979) System test framework needs to black list unresponsive cluster nodes after a timeout
Date Wed, 29 Sep 2010 16:08:34 GMT
System test framework needs to black list unresponsive cluster nodes after a timeout 
-------------------------------------------------------------------------------------

                 Key: HADOOP-6979
                 URL: https://issues.apache.org/jira/browse/HADOOP-6979
             Project: Hadoop Common
          Issue Type: Improvement
          Components: test
    Affects Versions: 0.22.0
            Reporter: Konstantin Boudnik


Sometimes one or more nodes in a cluster deployed for system testing purposes might become
unresponsive (hw failure, Hadoop daemon crashes, etc.). In the current implementation, Herriot
will be trying to connect to such a node(s) forever or until a timeout will occur. Instead,
an unresponsive node should be places into a blacklist and the framework has to move on.

A cluster should be declared unusable if NN or JT are placed on the blacklist, or if a certain
percentage of DNs (TTs) were blacklisted. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message