hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-620) replication factor should be calucalated based on actual dfs block sizes at the NameNode.
Date Fri, 10 Nov 2006 20:31:39 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-620?page=all ]

Sameer Paranjpye updated HADOOP-620:
------------------------------------

    Component/s: dfs
    Description: 
Currently 'dfs -report' calculates replication facto like the following :
     (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name space).

Problem with this is that this includes disk space used by non-dfs files (e.g. map reduce
jobs) on data node. On my single node test, I get replication factor of 100 since I have a
1 GB dfs file with out replication and there is 99GB of unrelated data on the same volume.

ideally name should calculate it with : (total size of all the blocks known to it) / (total
size of files in Name space).

Initial proposal to keep 'total size of all the blocks' update is to track it in datanode
descriptor and update it when namenode receives block reports from the datanode ( and subtract
when the datanode is removed).




  was:

Currently 'dfs -report' calculates replication facto like the following :
     (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name space).

Problem with this is that this includes disk space used by non-dfs files (e.g. map reduce
jobs) on data node. On my single node test, I get replication factor of 100 since I have a
1 GB dfs file with out replication and there is 99GB of unrelated data on the same volume.

ideally name should calculate it with : (total size of all the blocks known to it) / (total
size of files in Name space).

Initial proposal to keep 'total size of all the blocks' update is to track it in datanode
descriptor and update it when namenode receives block reports from the datanode ( and subtract
when the datanode is removed).





> replication factor should be calucalated based on actual dfs block sizes at the NameNode.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-620
>                 URL: http://issues.apache.org/jira/browse/HADOOP-620
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Raghu Angadi
>            Priority: Minor
>
> Currently 'dfs -report' calculates replication facto like the following :
>      (totalCapacity - totalDiskRemaining) / (totalSize of dfs files in Name space).
> Problem with this is that this includes disk space used by non-dfs files (e.g. map reduce
jobs) on data node. On my single node test, I get replication factor of 100 since I have a
1 GB dfs file with out replication and there is 99GB of unrelated data on the same volume.
> ideally name should calculate it with : (total size of all the blocks known to it) /
(total size of files in Name space).
> Initial proposal to keep 'total size of all the blocks' update is to track it in datanode
descriptor and update it when namenode receives block reports from the datanode ( and subtract
when the datanode is removed).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message