hadoop-hdfs-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject svn commit: r1131223 - in /hadoop/hdfs/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/hdfs_design.xml
Date Fri, 03 Jun 2011 20:44:13 GMT
Author: todd
Date: Fri Jun  3 20:44:13 2011
New Revision: 1131223

URL: http://svn.apache.org/viewvc?rev=1131223&view=rev
Log:
HDFS-1454. Update the documentation to reflect that clients don't write blocks to local disk
before copying to HDFS. Contributed by Harsh J Chouraria.

Modified:
    hadoop/hdfs/trunk/CHANGES.txt
    hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/hdfs_design.xml

Modified: hadoop/hdfs/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/CHANGES.txt?rev=1131223&r1=1131222&r2=1131223&view=diff
==============================================================================
--- hadoop/hdfs/trunk/CHANGES.txt (original)
+++ hadoop/hdfs/trunk/CHANGES.txt Fri Jun  3 20:44:13 2011
@@ -921,6 +921,9 @@ Release 0.22.0 - Unreleased
 
     HDFS-1957. Add documentation for HFTP. (Ari Rabkin via todd)
 
+    HDFS-1454. Update the documentation to reflect that clients don't write
+    blocks to local disk before copying to HDFS. (Harsh J Chouraria via todd)
+
   OPTIMIZATIONS
 
     HDFS-1140. Speedup INode.getPathComponents. (Dmytro Molkov via shv)

Modified: hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/hdfs_design.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/hdfs_design.xml?rev=1131223&r1=1131222&r2=1131223&view=diff
==============================================================================
--- hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/hdfs_design.xml (original)
+++ hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/hdfs_design.xml Fri Jun  3
20:44:13 2011
@@ -387,41 +387,11 @@
         </p>
       </section>
 
- 
-      <section>
-        <!-- XXX staging never described / referenced in its section -->
-        <title> Staging </title>
-        <p>
-        A client request to create a file does not reach the NameNode immediately. In fact,
initially the HDFS 
-        client caches the file data into a temporary local file. Application writes are transparently
redirected to 
-        this temporary local file. When the local file accumulates data worth over one HDFS
block size, the 
-        client contacts the NameNode. The NameNode inserts the file name into the file system
hierarchy 
-        and allocates a data block for it. The NameNode responds to the client request with
the identity 
-        of the DataNode and the destination data block. Then the client flushes the block
of data from the 
-        local temporary file to the specified DataNode. When a file is closed, the remaining
un-flushed data 
-        in the temporary local file is transferred to the DataNode. The client then tells
the NameNode that 
-        the file is closed. At this point, the NameNode commits the file creation operation
into a persistent 
-        store. If the NameNode dies before the file is closed, the file is lost. 
-        </p>
-        <p>
-        The above approach has been adopted after careful consideration of target applications
that run on 
-        HDFS. These applications need streaming writes to files. If a client writes to a
remote file directly 
-        without any client side buffering, the network speed and the congestion in the network
impacts 
-        throughput considerably. This approach is not without precedent. Earlier distributed
file systems, 
-        e.g. <acronym title="Andrew File System">AFS</acronym>, have used client
side caching to 
-        improve performance. A POSIX requirement has been relaxed to achieve higher performance
of 
-        data uploads. 
-        </p>
-      </section>
-
       <section>
         <title> Replication Pipelining </title>
         <p>
-        When a client is writing data to an HDFS file, its data is first written to a local
file as explained 
-        in the previous section. Suppose the HDFS file has a replication factor of three.
When the local 
-        file accumulates a full block of user data, the client retrieves a list of DataNodes
from the NameNode. 
-        This list contains the DataNodes that will host a replica of that block. The client
then flushes the 
-        data block to the first DataNode. The first DataNode starts receiving the data in
small portions (4 KB), 
+        When a client is writing data to an HDFS file with a replication factor of 3, the
NameNode retrieves a list of DataNodes using a replication target choosing algorithm.
+        This list contains the DataNodes that will host a replica of that block. The client
then writes to the first DataNode. The first DataNode starts receiving the data in small portions
(4 KB), 
         writes each portion to its local repository and transfers that portion to the second
DataNode in the list. 
         The second DataNode, in turn starts receiving each portion of the data block, writes
that portion to its 
         repository and then flushes that portion to the third DataNode. Finally, the third
DataNode writes the 



Mime
View raw message