hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-681) Adminstrative hook to pull live nodes out of a HDFS cluster
Date Wed, 29 Nov 2006 23:44:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-681?page=comments#action_12454491 ] 
Konstantin Shvachko commented on HADOOP-681:


= Index: src/java/org/apache/hadoop/dfs/ClientProtocol.java
  public static final long versionID = 4L;
You should write a comment what exactly changed for the ClientProtocol in new version (compared
to the previous one).
That way you will be able to track changes back from version to version looking at the history
of changes.

= Index: src/java/org/apache/hadoop/dfs/DFSClient.java
    may be it is better to use DatanodeID instead of mere String as a node parameter in decommission
    this will require an additional public constructor DatanodeID( String nodeName )

= Index: src/java/org/apache/hadoop/dfs/DFSAdmin.java
    decommission() does not document the return value
    boolean mode is never used
    final String safeModeUsage is never used
    decommission usage does not specify data-node parameters, 
    it is not clear what identifies data-nodes (host-port or just host, or storage id)

= Index: src/java/org/apache/hadoop/dfs/FSNamesystem.java
    decommissionInProgress() should start with is***()
    replicationInProgress() should start with is***()
    in startDecommission() and stopDecommission() it is better to call public method getName()
        rather than directly accessing node.name the protected member
    Block decommissionblocks[] should be decommissionBlocks, please check other places too.
    May be it is not a part of this patch but at some point we should combine two methods
datanodeReport() and
    DFSNodesStatus() so that data-node reports for the webUI and DFSShell were the same.

= Index: src/java/org/apache/hadoop/dfs/DatanodeDescriptor.java
    decommissioned() should start with is***()
    I don't think these constants are used anywhere in the code. They could be confused with
the enum values having the same names.
  public static final int NORMAL = 0;
  public static final int DECOMMISSION_INPROGRESS = 1;
  public static final int DECOMMISSIONED = 2;
    I propose to rename AdminStates to DecommissionState and eliminate NORMAL state replacing
it by null, where applicable.
    with a clear (imo) semantics: no decommission - no state.
    setAdminState() never uses the return value
    setAdminState( DECOMMISSIONED ) and setDecommissioned() are 2 ways to do the same thing

= Index: src/java/org/apache/hadoop/dfs/DatanodeInfo.java
= Index: src/java/org/apache/hadoop/dfs/DatanodeReport.java
    I don't think this class should be introduced at all.
    DatanodeReport effectively returns an entire DatanodeDescriptor.
    The original design of DatanodeInfo and DatanodeDescriptor distinguishes them in the way,
    DatanodeDescriptor is an internal name-node class which is never passed to the outside
    Anything that should be returned back to a client or a data-node should be contained in
the DatanodeInfo.
    So it is better to divide decommission-related methods and members to those that are name-node
    and those that are visible to the clients. The former should remain in the DatanodeDescriptor,
and the latter
    should go into DatanodeInfo. E.g.
    DatanodeInfo has private decommissionState and getters isDecommissioned() and isDecommissionInProgress()
    as well as getDatanodeReport(),
    everything else is in DatanodeDescriptor.

> Adminstrative hook to pull live nodes out of a HDFS cluster
> -----------------------------------------------------------
>                 Key: HADOOP-681
>                 URL: http://issues.apache.org/jira/browse/HADOOP-681
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: nodedecommission2.patch
> Introduction
> ------------
> An administrator sometimes needs to bring down a datanode for scheduled maintenance.
It would be nice if HDFS can be informed about this event. On receipt of this event, HDFS
can take steps so that HDFS data is not lost when the node goes down at a later time.
> Architecture
> -----------
> In the existing architecture, a datanode can be in one of two states: dead or alive.
A datanode is alive if its heartbeats are being processed by the namenode. Otherwise that
datanode is in dead state. We extend the architecture to introduce the concept of a tranquil
state for a datanode.
> A datanode is in tranquil state if:
>     - it cannot be a target for replicating any blocks
>     - any block replica that it currently contains does not count towards the target-replication-factor
of that block
> Thus, a node that is in tranquil state can be brought down without impacting the guarantees
provided by HDFS.
> The tranquil state is not persisted across namenode restarts. If the namenode restarts
then that datanode will go back to being in the dead or alive state.
> The datanode is completely transparent to the fact that it has been labeled as being
in tranquil state. It can continue to heartbeat and serve read requests for datablocks.
> DFSShell Design
> -----------------------
> We extend the DFS Shell utility to specify a list of nodes to the namenode.
>     hadoop dfs -tranquil {set|clear|get} datanodename1 [,datanodename2]
> The DFSShell utility sends this list to the namenode. This DFSShell command invoked with
the "set" option completes when the list is transferred to the namenode. This command is non-blocking;
it returns before the datanode is actually in the tranquil state. The client can then query
the state by re-issuing the command with the "get" option. This option will indicate whether
the datanode is in tranquil state or is "being tranquiled". The "clear" option is used to
transition a tranquil datanode to the alive state. The "clear" option is a no-op if the datanode
is not in the "tranquil" state.
> ClientProtocol Design
> --------------------
> The ClientProtocol is the protocol exported by the namenode for its client.
> This protocol is extended to incorporate three new methods:
>    ClientProtocol.setTranquil(String[] datanodes)
>    ClientProtocol.getTranquil(String datanode)
>    ClientProtocol.clearTranquil(String[] datanodes)
> The ProtocolVersion is incremented to prevent conversations between imcompatible clients
and servers. An old DFSShell cannot talk to the new NameNode and vice-versa.
> NameNode Design
> -------------------------
> The namenode does the bulk of the work for supporting this new feature.
> The DatanodeInfo object has a new private member named "state". It also has three new
member functions:
>     datanodeInfo.tranquilStarted(): start the process of tranquilization
>     datanodeInfo.tranquilCompleted(): node is not in tranquil state
>     datanodeInfo.clearTranquil() : remove tranquilization from node
> The namenode exposes a new API to set and clear tranquil states for a datanode. On receipt
of a "set tranquil" command, it invokes datanodeInfo.tranquilStarted().
> The FSNamesystem.chooseTarget() method skips over datanodes that are marked as being
in the "tranquil" state. This ensures that tranquil-datanodes are never chosen as targets
of replication. The namenode does *not* record
> this operation in either the FsImage or the EditLogs.
> The namenode puts all the blocks from a being-tranquiled node into the neededReplication
data structure. Necessary code changes are made to ensure that these blocks get replicated
by the regular replication method. As of now, the regular replication code does not distinguish
between these blocks and the blocks that are replication candidates because some other datanode
might have died. It might be prudent to give different (lower?) weightage to this type of
replication requests, but that exercise is deferred to a later date. In this design, replication
requests generated because of a node going to a tranquil state are not distinguished from
replication requests generated by a datanode going to the dead state.
> The DatanodeInfo object has another new private member named "pendingTranquilCount".
This field stores the remaining number of blocks that still remain to be replicated. This
field is valid only if the node is in the ets being-tranquiled state.  On receipt of every
'n' heartbeats from the being-tranquiled datanode, the namenode calculates the amount of data
that is still remaining to be replicated and updates the "pendingTranquilCount". in the DatanodeInfo.When
all the replications complete, the datanode is marked as tranquiled. The number 'n' is selected
in such a way that the average heartbeat processing time does not increase appreciably.
> It is possible that the namenode might stop receving heartbeats from a datanode that
is being-tranquiled. In this case,   the tranquil flag of the datanode gets cleared. It transitions
to the dead state and the normal processing for alive-to-dead transition occurs here.
> Web Interface
> -------------------
> The dfshealth.jsp displays the live nodes, dead nodes, being-tranquiled and tranquil
nodes. For nodes in the being-tranquiled state, it displays the percentage of tranquilization
completed till now.
> Issues
> --------
> 1. If a request for tranquilization starts getting processed and there aren't enough
space available in DFS to complete the necessary replication, then that node might remain
in the being-tranquiled state for a long long time. This is not necessarily a bad thing but
is there a better option?
> 2. We have opted for not storing cluster configuration information in the persistent
image of the file system. (The tranquil state of a datanode may be lost if the namenode restarts).

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message