hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-18549) Unclaimed replication queues can go undetected
Date Wed, 09 Aug 2017 20:59:00 GMT
Ashu Pachauri created HBASE-18549:
-------------------------------------

             Summary: Unclaimed replication queues can go undetected
                 Key: HBASE-18549
                 URL: https://issues.apache.org/jira/browse/HBASE-18549
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Ashu Pachauri
            Priority: Critical
             Fix For: 1.3.2


We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker
to fail picking up replication queue for a dead region server silently. One example is when
the znode size for a particular queue exceed jute.maxBuffer value.

There can be other situations that may lead to this and just go undetected. We need to have
a metric for number of unclaimed replication queues. This will help in mitigating the problem
through alerting on the metric and identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message