hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-227) Namespace check pointing is not performed until the namenode restarts.
Date Thu, 14 Dec 2006 01:17:24 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-227?page=comments#action_12458332 ] 
            
dhruba borthakur commented on HADOOP-227:
-----------------------------------------

Here is a much detailed writeup on the Backup NameNode proposal. "Secondary NameNode" and
"Backup NameNode" refer to the same node in this writeup. Please review and comment.

Configuration
-------------
There will be an additional file named "masters" in the configuration directory (similar to
the "slaves" file) that will list the node names where Secondary NameNode should be run. The
start-dfs.sh script will start the Secondary-NameNode appropriately.

The configuration file will have a the following new definitions:
    * fs.checkpoint.dir      : Location where the Secondary NameNode can download the
                                        fsImage and edits file.
    * fs.checkpoint.period   : Time (in seconds) between two checkpoints.
    * fs.checkpoint.size     : Size (in MB) of edit log that triggers a checkpoint.

The Secondary NameNode will use "org.apache.hadoop.dfs.NameNode.Alternate" property to log
its debug and informational messages.

Primary NameNode
--------------------------
The Primary NameNode will add the following new RPCs to the ClientProtocol:

    * getEditLogSize()
        This call returns the size of the current edit log file. This call fails
        if the NameNode is in SafeMode or there are more than one edit log file.

    * rollEditLog()
        This call closes the current edit log and opens a new edit log file.
        The names of the edit files are either "edits" or "edits.new". To keep
        complexity to a minimum, there will be a max of two edit log
        files "edits" and "edits.1".
        This call returns an error if any of the following conditions occur:
        - NameNode is in SafeMode
        - Both "edits" and 'edits.new" are already pre-existing

    * rollFsImage()
        This call does the following steps (atomically):
        - removes fsImage
        - copies fsImage.tmp to fsImage
        - removes edits
        - moves edit.new to edits
        This call fails if any of the files fsImage, fsImage.new or edits
        does not exist. It also fails if the dfs is in SafeMode.

The NameNode will have two additional servlets:
    * putFsImage.class
        This servlet causes all the incoming data to be stored in a file
        named fsImage.tmp in the dfs.name.dir directory. If this file already
        exists, then this call returns error.

    * getFile.class?param=pathname
        This servlet retrieves the contents of the specified file.

The Primary NameNode at startup time deletes fsImage.tmp (if it exists). The NameNode loads
the fsImage, then loads the edits and then loads edits.1.  Then it writes the merged fsImage,
deletes edits and edits.1.


Secondary NameNode
-------------------------------
The Secondary NameNode periodically pings the NameNode with the getCurrentEditLogSize() RPC.
This call returns the size of the current edit log. The Secondary NameNode initiates a checkpoint
if either the size of the edit log exceeds the size specified in the fs.checkpoint.size or
if the time since last checkpoint completion has exceeded fs.checkpoint.period.

The Secondary NameNode issues the rollEditLog() RPC to instruct the Primary NameNode to start
logging edits into edits.1.  The Secondary NameNode then uses the getFile servlet to fetch
the contents of fsImage and edits. It puts them in the fs.checkpoint.dir and, reads them into
memory, merges them and writes it back to fsImage.tmp. The Secondary NameNode than uploads
the fsImage.tmp file to the Primary NameNode using the putFsImage servlet.

Once the above steps are successful, the Secondary NameNode issues the rollFsImage() RPC.
A checkpoint is complete when this RPC completes successfully.

If any of the RPC calls returns an error, the Secondary NameNode discards all processing that
it might have done, logs an error message, and waits for the normal trigger to start the next
checkpoint.

Issues
------
1. The emphasis is on simplicity. For this reason, the NameNode restricts that there can be
only two outstanding edits file at any time: edits and edits.1. This ensures that there cannot
be more than one Secondary NameNode for a Primary NameNode.

2. The fact that rollFsImage() fails if either edits or edits.1 are non-existent means that
the system is protected against spurious checkpoint if the NameNode restarts when the Secondary
NameNode was doing a merge. This check can be made more explicit by returning a cookie with
the rollEditLog() command and enforcing that rollFsImage() supplies the same cookie. (The
Primary NameNode resets the cookie if it restarts).




> Namespace check pointing is not performed until the namenode restarts.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-227
>                 URL: http://issues.apache.org/jira/browse/HADOOP-227
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Konstantin Shvachko
>         Assigned To: dhruba borthakur
>         Attachments: patch-async-checkpoints-0.9.0, patch-async-checkpoints-0.9.0, patch-async-checkpoints-0.9.0
>
>
> In current implementation when the name node starts, it reads its image file, then
> the edits file, and then saves the updated image back into the image file.
> The image file is never updated after that.
> In order to provide the system reliability reliability the namespace information should
> be check pointed periodically, and the edits file should be kept relatively small.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message