Return-Path:
Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to
- support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support
+ support large files. It should provide high aggregate data bandwidth and scale to thousands of nodes in a single cluster. It should support
tens of millions of files in a single instance.
- HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed.
- This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler
- application fits perfectly with this model. There is a plan to support appending-writes to files in the future.
+ Most HDFS applications need a write-once-read-many access model for files. HDFS provides two additional advanced features: hflush and
+ append. Hflush makes the last block of an unclosed file visible to readers while providing read consistency and data durability. Append
+ provides a mechanism for opening a closed file to add additional data.
+
+ For complete details of the hflush and append design, see the
+ Append/Hflush/Read Design document (PDF).
HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside
these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and
- remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas. HDFS
- does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.
+ remove files, move a file from one directory to another, or rename a file. HDFS implements user quotas for number of names and
+ amount of data stored in a particular directory (See
+ HDFS Quota Admin Guide). In addition, HDFS
+ supports symbolic links.
The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is
@@ -163,8 +169,8 @@
HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence
of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance.
The block size and replication factor are configurable per file. An application can specify the number of replicas of a file.
- The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and
- have strictly one writer at any time.
+ The replication factor can be specified at file creation time and can be changed later. Files in HDFS are strictly one writer at any
+ time.
The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport
@@ -208,7 +214,8 @@
data reliability or read performance.
- The current, default replica placement policy described here is a work in progress.
+ In addition to the default placement policy described above, HDFS also provides a pluggable interface for block placement. See
+ BlockPlacementPolicy.
To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica
that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is
- preferred to satisfy the read request. If angg/ HDFS cluster spans multiple data centers, then a replica that is
+ preferred to satisfy the read request. If an HDFS cluster spans multiple data centers, then a replica that is
resident in the local data center is preferred over any remote replica.
The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. Propchange: hadoop/hdfs/branches/yahoo-merge/src/java/ ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -3,4 +3,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/java:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/java:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/java:820487 -/hadoop/hdfs/trunk/src/java:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/java:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 Propchange: hadoop/hdfs/branches/yahoo-merge/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -5,4 +5,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java:820487 -/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java:1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java:1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 Propchange: hadoop/hdfs/branches/yahoo-merge/src/test/hdfs/ ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -3,4 +3,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/test/hdfs:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/test/hdfs:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/test/hdfs:820487 -/hadoop/hdfs/trunk/src/test/hdfs:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/test/hdfs:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 Propchange: hadoop/hdfs/branches/yahoo-merge/src/webapps/datanode/ ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -3,4 +3,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/webapps/datanode:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/webapps/datanode:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/webapps/datanode:820487 -/hadoop/hdfs/trunk/src/webapps/datanode:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/webapps/datanode:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 Propchange: hadoop/hdfs/branches/yahoo-merge/src/webapps/hdfs/ ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -3,4 +3,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/webapps/hdfs:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/webapps/hdfs:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/webapps/hdfs:820487 -/hadoop/hdfs/trunk/src/webapps/hdfs:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/webapps/hdfs:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 Propchange: hadoop/hdfs/branches/yahoo-merge/src/webapps/secondary/ ------------------------------------------------------------------------------ --- svn:mergeinfo (original) +++ svn:mergeinfo Fri May 20 18:01:50 2011 @@ -3,4 +3,4 @@ /hadoop/hdfs/branches/HDFS-1052/src/webapps/secondary:1078924,1078943,1080331,1080391,1080402,1081603,1082326,1084245,1086788,1090419 /hadoop/hdfs/branches/HDFS-265/src/webapps/secondary:796829-820463 /hadoop/hdfs/branches/branch-0.21/src/webapps/secondary:820487 -/hadoop/hdfs/trunk/src/webapps/secondary:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576 +/hadoop/hdfs/trunk/src/webapps/secondary:987665-1004788,1026178-1028906,1032470-1033639,1034073,1034082-1034181,1034501-1034544,1035508,1039957,1040005,1052823,1060619,1061067,1062020,1062045,1062052,1071518,1074282,1080380,1080836,1083951,1087080,1091619,1092584,1095245,1095789,1096846,1097648,1097969,1098867,1099640,1101324,1101753,1104395,1104407,1124576