hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9502) DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance
Date Fri, 11 Dec 2015 21:49:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053633#comment-15053633

Tsz Wo Nicholas Sze commented on HDFS-9502:

Thanks Anu.  Copied [my earlier comment|https://issues.apache.org/jira/browse/HDFS-1312?focusedCommentId=15012417&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15012417]
(with some bug fix) below.

We may simply formulate the calculation using weighted mean and weighted variance.
- dfsUsedRatio_i for storage i is defined the same as before, i.e.
dfsUsedRatio_i = dfsUsed_i/capacity_i.
- Define normalized weight using capacity as 
w_i = capacity_i / sum(capacity_j).
- Then, define
    nodeWeightedMean = sum(w_j * dfsUsedRatio_j), and
nodeWeightedVariance = sum(w_j * (ratio_j - nodeWeightedMean)^2).
We use nodeWeightedVariance (instead of nodeDataDensity) to do comparison.  Note that nodeWeightedMean
is the same as idealStorage.

- Note also that the calculation of nodeWeightedVariance can be simplified as 
nodeWeightedVariance = sum(w_j * ratio_j^2) - nodeWeightedMean^2.

> DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance
> ----------------------------------------------------------------------------
>                 Key: HDFS-9502
>                 URL: https://issues.apache.org/jira/browse/HDFS-9502
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
> We use notions called Data Density which are based are similar to weighted mean and variance.
Make sure that computations map directly to these concepts since it is easier to understand
them than the density as defined in Disk Balancer now.

This message was sent by Atlassian JIRA

View raw message