hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-10424) HMaster could capture stacks of RSes it deems unresponsive during assignments
Date Mon, 27 Jan 2014 10:48:38 GMT
Harsh J created HBASE-10424:
-------------------------------

             Summary: HMaster could capture stacks of RSes it deems unresponsive during assignments
                 Key: HBASE-10424
                 URL: https://issues.apache.org/jira/browse/HBASE-10424
             Project: HBase
          Issue Type: Wish
          Components: Region Assignment
    Affects Versions: 0.96.0
            Reporter: Harsh J
            Priority: Trivial


Often there are cases of a region not getting assigned due to timeouts (while others do go
through). In this case, the Master does appear to enter a never-ending retry operation where
it retries each chosen server several times before moving to another.

For debugging in such a scenario, where the master is best aware of the situation, it could
use that to its advantage and help capture issues better if it probably setup an N retry threshold
(for # of servers tried) and run a HTTP GET on the current timing out RS's info port, to capture
its /stacks end point and dump the output in its logs for investigation later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message