hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6306) Standby NN can hold FSDirectory's writeLock for a long time under heavy load
Date Tue, 29 Apr 2014 23:36:15 GMT
Ming Ma created HDFS-6306:
-----------------------------

             Summary: Standby NN can hold FSDirectory's writeLock for a long time under heavy
load
                 Key: HDFS-6306
                 URL: https://issues.apache.org/jira/browse/HDFS-6306
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ming Ma


Standby NN uses FSEditLogLoader to update its namespace.  It can hold FSDirectory's writeLock
for a long time when active NN generates lots of edits.

{noformat}

loadEditRecords

    fsNamesys.writeLock();
    fsDir.writeLock();
    ...
    try {
      while (true) {
        try {
          FSEditLogOp op;
          try {
            op = in.readOp();
        ...
          }
       }
    } finally {
      ...
      fsDir.writeUnlock();
      fsNamesys.writeUnlock();
    }
{noformat}

With the fix in https://issues.apache.org/jira/browse/HDFS-5693, JMX response time is good
for active NN as it no longer requires FSnamesystem's lock, even though it still need to acquire
FSDirectory's readlock during FSDirectory's totalInodes. That isn't an issue for active NN
as each client RPC request might only acquire FSDirectory lock for short period of time. But
Standby NN could hold the lock for a longer period of time.

There are two ways to fix these:

1. Fix standby NN to acquire FSDirectory's writeLock for each edit record.
2. Fix FSDirectory's totalInodes to not take readLock so JMX can still go through.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message