hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From iwasak...@apache.org
Subject hadoop git commit: HDFS-11995. HDFS Architecture documentation incorrectly describes writing to a local temporary file. Contributed by Nandakumar.
Date Mon, 19 Jun 2017 23:08:11 GMT
Repository: hadoop
Updated Branches:
  refs/heads/trunk 73fb75017 -> d954a6473

HDFS-11995. HDFS Architecture documentation incorrectly describes writing to a local temporary
file. Contributed by Nandakumar.

Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d954a647
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d954a647
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d954a647

Branch: refs/heads/trunk
Commit: d954a64730c00346476322743462cde857164177
Parents: 73fb750
Author: Masatake Iwasaki <iwasakims@apache.org>
Authored: Tue Jun 20 08:07:42 2017 +0900
Committer: Masatake Iwasaki <iwasakims@apache.org>
Committed: Tue Jun 20 08:07:42 2017 +0900

 .../hadoop-hdfs/src/site/markdown/HdfsDesign.md | 33 +++-----------------
 1 file changed, 4 insertions(+), 29 deletions(-)

diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
index 4bf1897..76cd2bf 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -201,38 +201,13 @@ A typical block size used by HDFS is 128 MB.
 Thus, an HDFS file is chopped up into 128 MB chunks, and if possible,
 each chunk will reside on a different DataNode.
-### Staging
-A client request to create a file does not reach the NameNode immediately.
-In fact, initially the HDFS client caches the file data into a local buffer.
-Application writes are transparently redirected to this local buffer.
-When the local file accumulates data worth over one chunk size, the client contacts the NameNode.
-The NameNode inserts the file name into the file system hierarchy and allocates a data block
for it.
-The NameNode responds to the client request with the identity of the DataNode and the destination
data block.
-Then the client flushes the chunk of data from the local buffer to the specified DataNode.
-When a file is closed, the remaining un-flushed data in the local buffer is transferred to
the DataNode.
-The client then tells the NameNode that the file is closed. At this point,
-the NameNode commits the file creation operation into a persistent store.
-If the NameNode dies before the file is closed, the file is lost.
-The above approach has been adopted after careful consideration of target applications that
run on HDFS.
-These applications need streaming writes to files.
-If a client writes to a remote file directly without any client side buffering,
-the network speed and the congestion in the network impacts throughput considerably.
-This approach is not without precedent.
-Earlier distributed file systems, e.g. AFS, have used client side caching to improve performance.
-A POSIX requirement has been relaxed to achieve higher performance of data uploads.
 ### Replication Pipelining
-When a client is writing data to an HDFS file,
-its data is first written to a local buffer as explained in the previous section.
-Suppose the HDFS file has a replication factor of three.
-When the local buffer accumulates a chunk of user data,
-the client retrieves a list of DataNodes from the NameNode.
+When a client is writing data to an HDFS file with a replication factor of three,
+the NameNode retrieves a list of DataNodes using a replication target choosing algorithm.
 This list contains the DataNodes that will host a replica of that block.
-The client then flushes the data chunk to the first DataNode.
-The first DataNode starts receiving the data in small portions,
+The client then writes to the first DataNode.
+The first DataNode starts receiving the data in portions,
 writes each portion to its local repository and transfers that portion to the second DataNode
in the list.
 The second DataNode, in turn starts receiving each portion of the data block,
 writes that portion to its repository and then flushes that portion to the third DataNode.

To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org

View raw message