hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1855) fsck should verify block placement
Date Thu, 11 Oct 2007 01:40:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Konstantin Shvachko updated HADOOP-1855:
----------------------------------------

    Attachment: FsckBlockPlacement.patch

This patch verifies the replica placement policy. Currently it ensures that replicas are placed
on at least two racks if there are multiple racks.
There is a reasonable concern that we should improve our block placement distributing replicas
on at least replication-1 racks.
This would be beneficial for map-reduce jar and config files, which increases the likelihood
of finding the initial files on a local rack for tasks.
The patch contains a method that verifies the number of racks the block is actually replicated
to vs any required number of racks.
The method can be used in fsck once the improved replication policy is implemented. Until
then we should report only the blocks that
are replicated on less than 2 racks in order to avoid confusion among users and system administrators.

Features:
- fsck reports mis-placed blocks as long as it detects them.
- There is a new "-rack" option, which can be used together or instead of "-location". If
-rack is specified fsck prints data-node locations
prefixed with a string that defines this data-node placement in the cluster topology hierarchy.
For example, /rack/data-node or
/data-center/rack/data-node.
- fsck also prints the total number of mis-placed blocks.
- some trivial bugs were fixed, like, instead of printing number of blocks for each file the
old version was printing the total block count;
  also the average blocks replication and the percentage of over-replicated blocks was calculated
incorrectly.
- I included more statistics in the report:
-- number of minimally replicated blocks, which is useful for checking safe-mode condition.
-- total number of missing replicas
-- number of data-nodes and
-- number of racks.
- fsck help message is updated to reflect the new option and the actual options dependencies.


> fsck should verify block placement
> ----------------------------------
>
>                 Key: HADOOP-1855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1855
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: FsckBlockPlacement.patch
>
>
> fsck currently detects missing and under-replicated blocks. It would be helpful if it
can also detect blocks that do not conform to the block placement policy. An administrator
can use this tool to verify that blocks are distributed across racks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message