hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6900) Eliminate DU thread per block pool slice
Date Mon, 25 Aug 2014 23:16:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109954#comment-14109954

Colin Patrick McCabe commented on HDFS-6900:

I forget the JIRA numbers, but this has been discussed a bunch already.  Short answer is that
we can't assume that HDFS is the only thing using the disks, so we can't use DF to find out
how much space the DataNode is using.

That being said, my experience has been that most users would be absolutely happy with df
rather than du, because most users don't share their HDFS disks with other systems / nodes.
 In fact, I even saw a HOWTO online that instructed users to symlink {{/usr/bin/du}} to {{/usr/bin/df}}
when using Hadoop :(

It would be nice if we could somehow default to df for those people.

> Eliminate DU thread per block pool slice
> ----------------------------------------
>                 Key: HDFS-6900
>                 URL: https://issues.apache.org/jira/browse/HDFS-6900
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: Arpit Agarwal
> We use one DU thread per block pool slice to compute disk usage information. In addition
to the thread overhead this results in the disk usage information being out of date for up
to 10 minutes at a time. We can refresh it more frequently but then we'd be launching a shell
command per block pool slice even more often.

This message was sent by Atlassian JIRA

View raw message