hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-18549) Unclaimed replication queues can go undetected
Date Wed, 09 Aug 2017 20:59:00 GMT
Ashu Pachauri created HBASE-18549:

             Summary: Unclaimed replication queues can go undetected
                 Key: HBASE-18549
                 URL: https://issues.apache.org/jira/browse/HBASE-18549
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Ashu Pachauri
            Priority: Critical
             Fix For: 1.3.2

We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker
to fail picking up replication queue for a dead region server silently. One example is when
the znode size for a particular queue exceed jute.maxBuffer value.

There can be other situations that may lead to this and just go undetected. We need to have
a metric for number of unclaimed replication queues. This will help in mitigating the problem
through alerting on the metric and identifying underlying issues.

This message was sent by Atlassian JIRA

View raw message