hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-227) Namespace check pointing is not performed until the namenode restarts.
Date Wed, 29 Nov 2006 23:49:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-227?page=comments#action_12454492 ] 
Milind Bhandarkar commented on HADOOP-227:

Proposal for Copy-On-Write FileSystem Tree For Periodic Checkpointing

We propose that the hadoop namenode image be checkpointed to disk after
every fixed (configurable) number of transactions.

The checkpointing method we propose:

1. Does not introduce extensive changes in the simple locking model
   currently used in the namesystem (FSNamesystem).
2. Does not fork a heavyweight process to perform checkpointing.
3. Does not lock the entire namesystem during checkpointing.
4. Does not change the image or transaction log format in any way.
5. Does not significantly increase garbage collection activity.

This proposal is based on making the filesystem tree copy-on-write
*only during checkpointing*. We keep track of the number of outstanding
transactions in the main namenode thread. When this number reaches the
configured (dfs.checkpoint.interval) number (say 10 million), the
namenode thread that was performing the transaction (in a synchronized
method) performs the following actions:

1. Close the transaction log 'edits.N', where N is the current
   generation number. (Current fsedits is considered equivalent to
   'edits.0', and current fsimage is considered to be 'fsimage.0').
3. Creates a new transaction log 'edits.<N+1>'.
4. Wakes up a checkpointing thread to dump a new image.
5. Release namesystem lock.

This checkpointing thread:

1. Acquires global namesystem lock.
2. Sets a namenode-global boolean volatile variable
'checkpointingInProgress' to true.
3. Releases global lock.
4. Starts traversing the filesystem tree in breadth-first manner, and
writing it to the disk in a file called 'fsimage.<N+1>' and removes
fsimage.N, and edits.N.
5. After writing the image, reacquires the global namesystem lock.
6. Applies the changes on the shadow nodes to actual nodes.
7. Set checkpointingInProgress to false.
8. Releases the global namesystem lock.
9. Sleep waiting for notification to do checkpointing again. 

Step 6 operation will become clear, when we describe how the namenode
server threads change the namesystem tree *while* checkpointing is in

Namenode server threads always acquire the global namesystem lock
before making any changes to the filesystem tree. Therefore all the
steps described below occur in critical-section.

1. Check if checkpointingInProgress is false.
2. If it is false, perform the requested namesystem changes, exactly as
they are performed currently.
3. If it is true, locate the node of the filesystem tree that needs to
   be changed.
  3.1 If its member named 'shadow' of type 'Inode' is non-null,
      perform the requested changes to that node.
  3.2 Otherwise, create a new shadow Inode, clone all the fields from
      original Inode there, assign it to the 'shadow' field of original
      Inode. And perform the requested changes to the shadow Inode.
      Append the original node in a list called 'changedNodes'.
Step 6 of the checkpointing node consists of traversing the
'changedNodes' list, and replacing the fields of original node, with
it's shadow node, and resetting the shadow reference to null.

With this checkpointing scheme, the namenode startup procedure remains
unchanged, except that now the namenode looks for a valid image.N with
maximum N in the dfs.name.dir(s).

> Namespace check pointing is not performed until the namenode restarts.
> ----------------------------------------------------------------------
>                 Key: HADOOP-227
>                 URL: http://issues.apache.org/jira/browse/HADOOP-227
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Konstantin Shvachko
>         Assigned To: Milind Bhandarkar
> In current implementation when the name node starts, it reads its image file, then
> the edits file, and then saves the updated image back into the image file.
> The image file is never updated after that.
> In order to provide the system reliability reliability the namespace information should
> be check pointed periodically, and the edits file should be kept relatively small.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message