hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhouyingchao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
Date Thu, 02 Apr 2015 13:29:54 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhouyingchao updated HDFS-8045:
-------------------------------
    Description: 
After reserve some space via the param "dfs.datanode.du.reserved", we noticed that the namenode
usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume.
After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable
- following is the explaination.

For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed
by hdfs blocks, Reserved to represent reservation through "dfs.datanode.du.reserved", RbwReserved
to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which
will include non-hdfs files and meta data consumed by local filesystem).
In current implementation, for a volume, available space will be actually calculated as  
{code}
min{Raw - Reserved - DfsUsed -RbwReserved,  Raw - DfsUsed - NDfsUsed }
{code}
Later on, Namenode will calculate NonDfsUsed of the volume as 
{code}
Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed}
{code}

Given the calculation, finally we will have -
{code}
if (Reserved + RbwReserved > NDfsUsed) NonDfsUsed = RbwReserved;
else NonDfsUsed = NDfsUsed - Reserved;
{code}
Either way it is far from a correct value.

After investigation the implementation, we believe the Reserved and RbwReserved should be
subtract from available in getAvailable since they are actually not available to hdfs in any
way.  I'll post a patch soon.

  was:
After reserve some space via the param "dfs.datanode.du.reserved", we noticed that the namenode
usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume.
After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable
- following is the explaination.

For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed
by hdfs blocks, Reserved to represent reservation through "dfs.datanode.du.reserved", RbwReserved
to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which
will include non-hdfs files and meta data consumed by local filesystem).
In current implementation, for a volume, available space will be actually calculated as  min{Raw
- Reserved - DfsUsed -RbwReserved,  Raw - DfsUsed - NDfsUsed }. 
Later on, Namenode will calculate NonDfsUsed of the volume as "Raw - Reserved - DfsUsed -
min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed}".

Given the calculation, finally we will have -
if "Reserved + RbwReserved > NDfsUsed", then the calculated NonDfsUsed will be RbwReserved.
Otherwise if "Reserved + RbwReserved < NDfsUsed", then the calculated NonDfsUsed would
be "NDfsUsed - Reserved". Either way it is far from a correct value.

After investigation the implementation, we believe the Reserved and RbwReserved should be
subtract from available in getAvailable since they are actually not available to hdfs in any
way.  I'll post a patch soon.


> Incorrect calculation of NonDfsUsed and Remaining
> -------------------------------------------------
>
>                 Key: HDFS-8045
>                 URL: https://issues.apache.org/jira/browse/HDFS-8045
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: zhouyingchao
>            Assignee: zhouyingchao
>         Attachments: HDFS-8045-001.patch
>
>
> After reserve some space via the param "dfs.datanode.du.reserved", we noticed that the
namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to
the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable
- following is the explaination.
> For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed
by hdfs blocks, Reserved to represent reservation through "dfs.datanode.du.reserved", RbwReserved
to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which
will include non-hdfs files and meta data consumed by local filesystem).
> In current implementation, for a volume, available space will be actually calculated
as  
> {code}
> min{Raw - Reserved - DfsUsed -RbwReserved,  Raw - DfsUsed - NDfsUsed }
> {code}
> Later on, Namenode will calculate NonDfsUsed of the volume as 
> {code}
> Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed
- NDfsUsed}
> {code}
> Given the calculation, finally we will have -
> {code}
> if (Reserved + RbwReserved > NDfsUsed) NonDfsUsed = RbwReserved;
> else NonDfsUsed = NDfsUsed - Reserved;
> {code}
> Either way it is far from a correct value.
> After investigation the implementation, we believe the Reserved and RbwReserved should
be subtract from available in getAvailable since they are actually not available to hdfs in
any way.  I'll post a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message