hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Tianyi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9412) getBlocks occupies FSLock and takes too long to complete
Date Tue, 08 Dec 2015 05:56:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046417#comment-15046417
] 

He Tianyi commented on HDFS-9412:
---------------------------------

[~andrew.wang] Perhaps switching to unfair RWLock may cause other issues, since machine running
NameNode does not necessarily have SMP architecture. 

I think this is due to having many small blocks in cluster, {{getBlocks}} is called by Balancer
and will not return until exhausted or total size satisfies, and there are actually many threads
doing the same thing ({{dfs.balancer.dispatcherThreads}}). 
Besides decreasing number of threads, maybe we can make this faster either.

> getBlocks occupies FSLock and takes too long to complete
> --------------------------------------------------------
>
>                 Key: HDFS-9412
>                 URL: https://issues.apache.org/jira/browse/HDFS-9412
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: He Tianyi
>            Assignee: He Tianyi
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a long time
to complete (probably several seconds, if number of blocks are too much). 
> During this period, other threads attempting to acquire write lock will wait. 
> In an extreme case, RPC handlers are occupied by one reader thread calling {{getBlocks}}
and all other threads waiting for write lock, rpc server acts like hung. Unfortunately, this
tends to happen in heavy loaded cluster, since read operations come and go fast (they do not
need to wait), leaving write operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by splitting
the operation into smaller sub operations, and let other threads do their work between each
sub operation. The whole result is returned at once, though (one thing different from DN block
report). 
> I am not sure whether this will work. Any better idea?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message