hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Tianyi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
Date Wed, 11 Nov 2015 11:01:10 GMT
He Tianyi created HDFS-9412:

             Summary: getBlocks occupies FSLock and takes too long to complete
                 Key: HDFS-9412
                 URL: https://issues.apache.org/jira/browse/HDFS-9412
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: He Tianyi
            Assignee: He Tianyi

{{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a long time to complete
(probably several seconds, if number of blocks are too much). 
During this period, other threads attempting to acquire write lock will wait. 
In an extreme case, RPC handlers are occupied by one reader thread calling {{getBlocks}} and
all other threads waiting for write lock, rpc server acts like hung. Unfortunately, this tends
to happen in heavy loaded cluster, since read operations come and go fast (they do not need
to wait), leaving write operations waiting.

Looks like we can optimize this thing like DN block report did in past, by splitting the operation
into smaller sub operations, and let other threads do their work between each sub operation.
The whole result is returned at once, though (one thing different from DN block report). But
there will be no more starvation.
I am not sure whether this will work. Any better idea?

This message was sent by Atlassian JIRA

View raw message