hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8873) throttle directoryScanner
Date Thu, 24 Sep 2015 16:33:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906603#comment-14906603

Nathan Roberts commented on HDFS-8873:

Thanks [~templedf]. I like that the stopwatch class makes this much cleaner. Just a couple
of comments:
- Shouldn't the isInterrupted() check throw an InterruptedException? Otherwise won't we just
break out of one level? It would probably be good to test shutdown on an actual cluster if
possible because you're exactly right that we could be in here a long time and it would be
good to make sure we don't affect shutdown of the datanode. This has been a problem in the
past and can have a serious impact on rolling upgrades.
- nit but I find markRunning() and markWaiting() confusing (seem backwards to me because we
call markRunning() just before going to sleep).
- I'm kind of wondering if we should disallow extremely low duty cycles. Seems like it could
take close to 24 hours with a minimum setting. A minimum of 20% should keep us within an hour.

> throttle directoryScanner
> -------------------------
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, HDFS-8873.003.patch, HDFS-8873.004.patch,
HDFS-8873.005.patch, HDFS-8873.006.patch, HDFS-8873.007.patch, HDFS-8873.008.patch
> The new 2-level directory layout can make directory scans expensive in terms of disk
seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that would reduce
its impact on disk performance (much like the approach in HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time (assuming
the common case of all inodes in cache but no directory blocks cached, 64K seeks are required
for full directory listing which translates to 655 seconds) 

This message was sent by Atlassian JIRA

View raw message