hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14621) ReplicationLogCleaner gets stuck when a regionserver crashes
Date Thu, 15 Oct 2015 18:54:05 GMT
Ashu Pachauri created HBASE-14621:
-------------------------------------

             Summary: ReplicationLogCleaner gets stuck when a regionserver crashes
                 Key: HBASE-14621
                 URL: https://issues.apache.org/jira/browse/HBASE-14621
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Ashu Pachauri
            Assignee: Ashu Pachauri
            Priority: Critical


The ReplicationLogCleaner has a bug that makes it get stuck in an infinite loop when a regionserver
crashes. This bug was introduced in the fix for HBASE-12865; which makes sure that the loadWALsFromQueues
method attempts a retry whenever the replication node's cversion is changed in the middle
of loading the replication queue for the regionservers. However, if this scenario actually
happens (a regionserver crash in the middle of the operation), it will get stuck in an infinite
loop.

It has very serious ramifications because the old WALs are not cleaned up because of this
and in a high load environment, the file count in the oldWALs directory soon exceeds the inode
limit and the cluster goes down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message