hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8873) throttle directoryScanner
Date Tue, 15 Sep 2015 01:10:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744628#comment-14744628

Colin Patrick McCabe commented on HDFS-8873:

bq. Encapsulating the throttle stuff requires a reasonable abstraction of what a throttle
is. The various kinds of throttles (time, file, iops, ...) are all pretty different and aren't
easy to overlay with a single abstraction. I decided to give up on the idea of making the
throttle type selectable. The limit therefore always means the same thing, and so I think
it's fair to leave it's name as is.

I agree that it is probably premature to have a single Throttle base class that can do all
the various possible things you might want a Throttle to do.  But it doesn't follow from that
that we need to give up on the idea of making the throttle type selectable in the future.

Also, configuration keys which are specified in terms of milliseconds should always end in
ms.  It causes a huge amount of confusion when times come without units.  It is obvious to
you-- the author of the code-- that it is in milliseconds, but it is not obvious to users
or other developers.

Why not just have a single class called {{TimeBasedThrottle}} which does whatever you want
your time-based throttle to do?  You can make it a standalone class that doesn't extend or
implement anything, and even create an instance of it that does nothing when throttling is
turned off.  But keep the flexible configuration mechanism that we discussed earlier.  That
way, if someone wants to do something more elaborate later, they can.

{{hadoop-hdfs-project/hadoop-hdfs/now}}: did you intend to put this in the patch?

398	      "dfs.datanode.directoryscan.throttle.limit";
Should end with "ms" to indicate the limit is in milliseconds.  Or ideally "ms.per.sec"

-    if (!retainDiffs) clear();
+    if (!retainDiffs) {
+      clear();
+    }
Can we move small whitespace cleanups like this to a follow-on change?  It just makes backports
a pain because it creates unnecessary conflicts, and tends to obscure what this JIRA is about
when people are reading the change log.

+        } //end for
+      } //end synchronized
+    } // end if
If we're changing this, then let's get rid of the COBOLisms.  A close brace is enough to know
that the block is closed.

> throttle directoryScanner
> -------------------------
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch
> The new 2-level directory layout can make directory scans expensive in terms of disk
seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that would reduce
its impact on disk performance (much like the approach in HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time (assuming
the common case of all inodes in cache but no directory blocks cached, 64K seeks are required
for full directory listing which translates to 655 seconds) 

This message was sent by Atlassian JIRA

View raw message