hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5064) Standby checkpoints should not block concurrent readers
Date Mon, 05 Aug 2013 14:20:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729534#comment-13729534
] 

Kihwal Lee commented on HDFS-5064:
----------------------------------

A long read lock against FSNamesystem should be avoided. Even on ANN, repeated getContentSummary()
calls against big directory trees can degrade the performance significantly. I complained
about the fairness setting, but realized that it can get worse without it. 

I think most of writers on SBN are datanodes. If this is true, separating FSN and BlockManager
locking will help. Last time I checked, we wanted a facility to enforce lock hierarchy before
attempting to do this.

Or we could resort to a SBN-scpecific solution, since it probably only needs to block EditLogTailer
and perhaps prevent concurrent checkpointing.
                
> Standby checkpoints should not block concurrent readers
> -------------------------------------------------------
>
>                 Key: HDFS-5064
>                 URL: https://issues.apache.org/jira/browse/HDFS-5064
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>    Affects Versions: 2.1.1-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> We've observed an issue which causes fetches of the {{/jmx}} page of the NN to take a
long time to load when the standby is in the process of creating a checkpoint.
> Even though both creating the checkpoint and gathering the statistics for {{/jmx}} take
only the FSNS read lock, the issue is that since the FSNS uses a _fair_ RW lock, a single
writer attempting to get the lock will block all threads attempting to get only the read lock
for the duration of the checkpoint. This will cause {{/jmx}}, and really any thread only attempting
to get the read lock, to block for the duration of the checkpoint, even though they should
be able to proceed concurrently with the checkpointing thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message