hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Kling (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1501) The logic that makes namenode exit safemode should be pluggable
Date Fri, 17 Dec 2010 00:49:02 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Patrick Kling updated HDFS-1501:

    Attachment: HDFS-1501.patch

This patch introduces two configuration parameters, dfs.namenode.safemode.policy and dfs.namenode.safemode.policy.manual,
which specify the safe mode policy to use after name node start-up and when manually entering
safe mode, respectively. This will make it easier to use custom safe mode policies (e.g.,
a policy that takes into account when files are RAIDed).

The default implementation for dfs.namenode.safemode.policy, StartupSafeModePolicy, leaves
safe mode once a certain fraction of blocks have reached a safe replication level and once
a specified number of data nodes have checked in (after waiting for an additional extension
period). It also initializes the replication queues once a certain block threshold has been
reached (cf. HDFS-1476). This is the same behaviour currently implemented by FSNamesystem.SafeModeInfo.

The default class for dfs.namenode.safemode.policy.manual, ManualSafeModePolicy, never leaves
safe mode and never initializes the replication queues. Currently, this is achieved by setting
the thresholds in FSNamesystem.SafeModeInfo to values that are so high that they can never
be reached.

With this patch, FSNamesystem.SafeModeMonitor periodically polls the safe mode policy whenever
the name node is in safe mode. This is different from the current behaviour, which performs
this check after every block report and only uses polling during the safe mode extension phase.

This patch is still a work in progress and I would appreciate any feedback on this idea.

> The logic that makes namenode exit safemode should be pluggable
> ---------------------------------------------------------------
>                 Key: HDFS-1501
>                 URL: https://issues.apache.org/jira/browse/HDFS-1501
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Patrick Kling
>         Attachments: HDFS-1501.patch
> HDFS RAID creates parity blocks for data blocks. So, even if all replicas of a block
is missing, it is possible ro recreate it from the parity blocks. Thus, when the namenode
restarts, it should use a different RAID-aware logic to figure out whether all blocks are
healthy or not.
> My proposal is to make the code that NN uses to exit safemode be pluggable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message