hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using
Date Tue, 26 Jun 2007 19:16:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508291

Hairong Kuang commented on HADOOP-1463:

To summarize what we have discussed:

each data node's disk space = dfs used space + reserved space + remaining space

where dfs used space is a summation of all data dir sizes, reserved space is reserved for
non-dfs usage whether it is used or unused, and remaining space is for future dfs usage. 

dfs capacity = dfs used space + remaining space

data node sends dfs capacity and remaining space to namenode at each heartbeat.

I plan to run "df" when datanode gets started to get the data node's disk space and  the reserved
space. I plan to keep track of dfs used space by running a "du" when a blockreport is sent
and gets adjusted when a block is written or is deleted.

Please comment if you have any other opinion.

Regarding the reserved space, currently hadoop-default.xml supports the following two properties.
Shall we enforce that only one of them is non-zero?
  <description>Reserved space in bytes. Always leave this much space free fornon dfs

  <description>When calculating remaining space, only use this percentage of the real
available space

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which
is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space.
Each datanode's remaining space is calculated by using the following formula: remaining space
= unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space
shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from
the total capacity. So the used space does not accurately shows the space that dfs is using.
However it is a very important number that dfs should provide.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message