hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjs...@apache.org
Subject [36/50] hadoop git commit: HDFS-8326. Documentation about when checkpoints are run is out of date. (Misty Stanley-Jones via xyao)
Date Sat, 09 May 2015 00:42:30 GMT
HDFS-8326. Documentation about when checkpoints are run is out of date. (Misty Stanley-Jones
via xyao)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/cf22f0b2
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/cf22f0b2
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/cf22f0b2

Branch: refs/heads/YARN-2928
Commit: cf22f0b2d1af8614214ceb60593430554067229b
Parents: 6dac400
Author: Xiaoyu Yao <xyao@apache.org>
Authored: Fri May 8 14:46:25 2015 -0700
Committer: Zhijie Shen <zjshen@apache.org>
Committed: Fri May 8 17:32:52 2015 -0700

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt                     | 3 +++
 hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/cf22f0b2/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index fedef79..b766e26 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -720,6 +720,9 @@ Release 2.8.0 - UNRELEASED
     HDFS-8311. DataStreamer.transfer() should timeout the socket InputStream.
     (Esteban Gutierrez via Yongjun Zhang)
 
+    HDFS-8326. Documentation about when checkpoints are run is out of date.
+    (Misty Stanley-Jones via xyao)
+
 Release 2.7.1 - UNRELEASED
 
   INCOMPATIBLE CHANGES

http://git-wip-us.apache.org/repos/asf/hadoop/blob/cf22f0b2/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
index 5a8e366..a30877a 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -135,9 +135,10 @@ The Persistence of File System Metadata
 
 The HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called
the EditLog to persistently record every change that occurs to file system metadata. For example,
creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating
this. Similarly, changing the replication factor of a file causes a new record to be inserted
into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog.
The entire file system namespace, including the mapping of blocks to files and file system
properties, is stored in a file called the FsImage. The FsImage is stored as a file in the
NameNode’s local file system too.
 
-The NameNode keeps an image of the entire file system namespace and file Blockmap in memory.
This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is
plenty to support a huge number of files and directories. When the NameNode starts up, it
reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to
the in-memory representation of the FsImage, and flushes out this new version into a new FsImage
on disk. It can then truncate the old EditLog because its transactions have been applied to
the persistent FsImage. This process is called a checkpoint. In the current implementation,
a checkpoint only occurs when the NameNode starts up. Work is in progress to support periodic
checkpointing in the near future.
+The NameNode keeps an image of the entire file system namespace and file Blockmap in memory.
When the NameNode starts up, or a checkpoint is triggered by a configurable threshold, it
reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to
the in-memory representation of the FsImage, and flushes out this new version into a new FsImage
on disk. It can then truncate the old EditLog because its transactions have been applied to
the persistent FsImage. This process is called a checkpoint. The purpose of a checkpoint is
to make sure that HDFS has a consistent view of the file system metadata by taking a snapshot
of the file system metadata and saving it to FsImage. Even though it is efficient to read
a FsImage, it is not efficient to make incremental edits directly to a FsImage. Instead of
modifying FsImage for each edit, we persist the edits in the Editlog. During the checkpoint
the changes from Editlog are applied to the FsImage. A checkpoint can be tri
 ggered at a given time interval (`dfs.namenode.checkpoint.period`) expressed in seconds,
or after a given number of filesystem transactions have accumulated (`dfs.namenode.checkpoint.txns`).
If both of these properties are set, the first threshold to be reached triggers a checkpoint.
+
+The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge
about HDFS files. It stores each block of HDFS data in a separate file in its local file system.
The DataNode does not create all files in the same directory. Instead, it uses a heuristic
to determine the optimal number of files per directory and creates subdirectories appropriately.
 It is not optimal to create all local files in the same directory because the local file
system might not be able to efficiently support a huge number of files in a single directory.
When a DataNode starts up, it scans through its local file system, generates a list of all
HDFS data blocks that correspond to each of these local files, and sends this report to the
NameNode. The report is called the _Blockreport_.
 
-The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge
about HDFS files. It stores each block of HDFS data in a separate file in its local file system.
The DataNode does not create all files in the same directory. Instead, it uses a heuristic
to determine the optimal number of files per directory and creates subdirectories appropriately.
It is not optimal to create all local files in the same directory because the local file system
might not be able to efficiently support a huge number of files in a single directory. When
a DataNode starts up, it scans through its local file system, generates a list of all HDFS
data blocks that correspond to each of these local files and sends this report to the NameNode:
this is the Blockreport.
 
 The Communication Protocols
 ---------------------------


Mime
View raw message