hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianying Chang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-15155) Show All RPC handler tasks stop working after cluster is under heavy load for a while
Date Thu, 21 Jan 2016 23:05:39 GMT
Tianying Chang created HBASE-15155:
--------------------------------------

             Summary: Show All RPC handler tasks stop working after cluster is under heavy
load for a while
                 Key: HBASE-15155
                 URL: https://issues.apache.org/jira/browse/HBASE-15155
             Project: HBase
          Issue Type: Bug
          Components: monitoring
    Affects Versions: 0.94.19, 1.0.0, 0.98.0
            Reporter: Tianying Chang
            Assignee: Tianying Chang


After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC handler status" link
on RS webUI stops working after running in production cluster with relatively high load for
several days.  

Turn out to be it is a bug introduced by https://issues.apache.org/jira/browse/HBASE-10312
The BoundedFIFOBuffer cause RPCHandler Status overriden/removed permanently when there is
a spike of non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS experienced
"high" load once, the RPC status monitoring is gone forever, until RS is restarted. 

 We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message