INCOMPATIBLE CHANGES - (16) -

INCOMPATIBLE CHANGES + (18) +
1. HADOOP-3595. Remove deprecated methods for mapred.combine.once functionality, which was necessary to providing backwards compatible combiner semantics for 0.18.
  (cdouglas via omalley)
2. HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
  (Sanjay Radia via hairong)
3. HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool interface and GenericOptionsParser.
  (Enis Soztutar via acmurthy)
4. HADOOP-2816. Cluster summary at name node web reports the space +utilization as: +Configured Capacity: capacity of all the data directories - Reserved space +Present Capacity: Space available for dfs,i.e. remaining+used space +DFS Used%: DFS used space/Present Capacity
  (Suresh Srinivas via hairong)
5. HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace +quotas in 0.18.
  (rangadi)
NEW FEATURES - (31) -
1. NEW FEATURES + (39) +
  1. HADOOP-3341. Allow streaming jobs to specify the field separator for map and reduce input and output. The new configuration values are: stream.map.input.field.separator @@ -173,12 +180,30 @@ directory.
    (hairong via szetszwo)
  2. HADOOP-3981. Implement a distributed file checksum algorithm in HDFS and change DistCp to use file checksum for comparing src and dst files
    (szetszwo)
  3. HADOOP-3829. Narrown down skipped records based on user acceptable value.
    (Sharad Agarwal via ddas)
  4. HADOOP-3930. Add common interfaces for the pluggable schedulers and the +cli & gui clients.
    (Sreekanth Ramakrishnan via omalley)
  5. HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem.
    (szetszwo)
  6. HADOOP-249. Reuse JVMs across Map-Reduce Tasks. +Configuration changes to hadoop-default.xml: + add mapred.job.reuse.jvm.num.tasks
    (Devaraj Das via acmurthy)
  7. HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the +query language.
    (tomwhite)
  8. HADOOP-2536. Implement a JDBC based database input and output formats to +allow Map-Reduce applications to work with databases.
    (Fredrik Hedberg and +Enis Soztutar via acmurthy)
  9. HADOOP-3019. A new library to support total order partitions.
    (cdouglas via omalley)
  10. HADOOP-3924. Added a 'KILLED' job status.
    (Subramaniam Krishnan via +acmurthy)
2. IMPROVEMENTS - (55) -
  1. HADOOP-3908. Fuse-dfs: better error message if llibhdfs.so doesn't exist.
    (Pete Wyckoff through zshao)
  2. IMPROVEMENTS + (68) +
    1. HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
      (zshao)
    2. HADOOP-4106. libhdfs: add time, permission and user attribute support (part 2).
      (Pete Wyckoff through zshao)
    3. HADOOP-4104. libhdfs: add time, permission and user attribute support.
      (Pete Wyckoff through zshao)
    4. HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
      (Pete Wyckoff through zshao)
    5. HADOOP-3732. Delay intialization of datanode block verification till the verification thread is started.
      (rangadi)
    6. HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
      (rangadi)
    7. HADOOP-4184. Break the module dependencies between core, hdfs, and mapred.
      (tomwhite via omalley)
    8. HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
      (Ramya R via nigel)
    9. HADOOP-4117. Improve configurability of Hadoop EC2 instances.
      (tomwhite)
    10. HADOOP-2411. Add support for larger CPU EC2 instance types.
      (Chris K Wensel via tomwhite)
    11. HADOOP-4083. Changed the configuration attribute queue.name to +mapred.job.queue.name.
      (Hemanth Yamijala via acmurthy)
    12. HADOOP-4194. Added the JobConf and JobID to job-related methods in +JobTrackerInstrumentation for better metrics.
      (Mac Yang via acmurthy)
    13. HADOOP-3975. Change test-patch script to report working the dir +modifications preventing the suite from being run.
      (Ramya R via cdouglas)
    14. HADOOP-4124. Added a command-line switch to allow users to set job +priorities, also allow it to be manipulated via the web-ui.
      (Hemanth +Yamijala via acmurthy)
    15. HADOOP-2165. Augmented JobHistory to include the URIs to the tasks' +userlogs.
      (Vinod Kumar Vavilapalli via acmurthy)
    16. HADOOP-4062. Remove the synchronization on the output stream when a +connection is closed and also remove an undesirable exception when +a client is stoped while there is no pending RPC request.
      (hairong)
    17. HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
      (szetszwo)
  3. OPTIMIZATIONS - (8) -
    1. OPTIMIZATIONS + (9) +
      1. HADOOP-3556. Removed lock contention in MD5Hash by changing the singleton MessageDigester by an instance per Thread using ThreadLocal.
        (Iv?n de Prado via omalley)
      2. HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading it from a different .crc file.
        (Jothi Padmanabhan via ddas)
      3. HADOOP-3638. Caches the iFile index files in memory to reduce seeks
        (Jothi Padmanabhan via ddas)
    2. BUG FIXES - (79) -
      1. BUG FIXES + (88) +
        
        HADOOP-3563. Refactor the distributed upgrade code so that it is easier to identify datanode and namenode related code.
        (dhruba)
        
        HADOOP-3640. Fix the read method in the NativeS3InputStream.
        (tomwhite via @@ -452,6 +496,21 @@
        HADOOP-4138. Refactor the Hive SerDe library to better structure the interfaces to the serializer and de-serializer.
        (Zheng Shao via dhruba)
        
        HADOOP-4195. Close compressor before returning to codec pool.
        (acmurthy via omalley)
        +
        HADOOP-2403. Escapes some special characters before logging to +history files.
        (Amareshwari Sriramadasu via ddas)
        +
        HADOOP-4200. Fix a bug in the test-patch.sh script.
        (Ramya R via nigel)
        +
        HADOOP-4084. Add explain plan capabilities to Hive Query Language.
        (Ashish Thusoo via dhruba)
        +
        HADOOP-4121. Preserve cause for exception if the initialization of +HistoryViewer for JobHistory fails.
        (Amareshwari Sri Ramadasu via +acmurthy)
        +
        HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
        (Sreekanth Ramakrishnan via ddas)
        +
        HADOOP-4077. Setting access and modification time for a file +requires write permissions on the file.
        (dhruba)
        +
        HADOOP-3592. Fix a couple of possible file leaks in FileUtil
        (Bill de hOra via rangadi)
        +
        HADOOP-4120. Hive interactive shell records the time taken by a +query.
        (Raghotham Murthy via dhruba)
        +
        HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME +and then the path.
        (Raghotham Murthy via dhruba)

mapred.min.split.size

mapred.skip.mode.enabled

mapred.skip.attempts.to.start.skipping

mapred.skip.map.auto.incr.proc.count

mapred.skip.reduce.auto.incr.proc.count

mapred.skip.out.dir

mapred.skip.map.max.skip.records

mapred.skip.reduce.max.skip.groups

ipc.client.idlethreshold

queue.name

mapred.job.queue.name

mapred.tasktracker.indexcache.mb

- Name Space Quotas Administrator Guide -

Directory Quotas Administrator's Guide

PDF

- Name Space Quotas Administrator Guide -

- The Hadoop Distributed File System (HDFS) allows the administrator to set quotas on individual directories. - Newly created directories have no associated quota. - The largest quota is Long.Max_Value. A quota of one forces a directory - to remain empty. -

- The directory quota is a hard limit on the number of names in the tree - rooted at that directory. File and directory creations fault if the quota - would be exceeded. Quotas stick to renamed directories; the rename - operation faults if operation would result in a quota violation. - The attempt to set a quota faults if the directory would be in violation - of the new quota. -

- Quotas are persistent with the fsimage. When starting, if the fsimage - is immediately in violation of a quota (perhaps the fsimage was - surreptitiously modified), the startup operation fails with an error report. - Setting or removing a quota creates a journal entry. -

- The following new commands or new options are added to support quotas. - The first two are administration commands. -

Directory Quotas Administrator's Guide

+Name Quotas +
+Space Quotas +
+Administrative Commands +
+Reporting Command +

The Hadoop Distributed File System (HDFS) allows the administrator to set quotas for the number of names used and the +amount of space used for individual directories. Name quotas and space quotas operate independently, but the administration and +implementation of the two types of quotas are closely parallel.

Name Quotas

The name quota is a hard limit on the number of file and directory names in the tree rooted at that directory. File and +directory creations fail if the quota would be exceeded. Quotas stick with renamed directories; the rename operation fails if +operation would result in a quota violation. The attempt to set a quota fails if the directory would be in violation of the new +quota. A newly created directory has no associated quota. The largest quota is Long.Max_Value. A quota of one +forces a directory to remain empty. (Yes, a directory counts against its own quota!)

Quotas are persistent with the fsimage. When starting, if the fsimage is immediately in +violation of a quota (perhaps the fsimage was surreptitiously modified), +a warning is printed for each of such violations. Setting or removing a quota creates a journal entry.

Space Quotas

The space quota is a hard limit on the number of bytes used by files in the tree rooted at that directory. Block +allocations fail if the quota would not allow a full block to be written. Each replica of a block counts against the quota. Quotas +stick with renamed directories; the rename operation fails if the operation would result in a quota violation. The attempt to +set a quota fails if the directory would be in violation of the new quota. A newly created directory has no associated quota. +The largest quota is Long.Max_Value. A quota of zero still permits files to be created, but no blocks can be added to the files. +Directories don't use host file system space and don't count against the space quota. The host file system space used to save +the file meta data is not counted against the quota. Quotas are charged at the intended replication factor for the file; +changing the replication factor for a file will credit or debit quotas.

Quotas are persistent with the fsimage. When starting, if the fsimage is immediately in +violation of a quota (perhaps the fsimage was surreptitiously modified), a warning is printed for +each of such violations. Setting or removing a quota creates a journal entry.

Administrative Commands

Quotas are managed by a set of commands available only to the administrator.

- + + +
dfsadmin -setquota <N> <directory>...<directory> - -
- Set the quota to be N for each directory. Best effort for each directory, - with faults reported if N is not a positive long integer, - the directory does not exist or it is a file, or the directory would - immediately exceed the new quota. -
- +
Set the name quota to be N for +each directory. Best effort for each directory, with faults reported if N is not a positive long integer, the +directory does not exist or it is a file, or the directory would immediately exceed the new quota.
dfsadmin -clrquota <directory>...<director> -
- Remove any quota for each directory. Best effort for each directory, - with faults reported if the directory does not exist or it is a file. - It is not a fault if the directory has no quota. -

+dfsadmin -setspacequota <N> <directory>...<directory> +
Set the space quota to be +N×2³⁰ bytes (GB) for each directory. Best effort for each directory, with faults reported if N is +neither zero nor a positive integer, the directory does not exist or it is a file, or the directory would immediately exceed +the new quota.
+dfsadmin -clrspacequota <directory>...<director> +
Remove any space quota for each directory. Best +effort for each directory, with faults reported if the directory does not exist or it is a file. It is not a fault if the +directory has no quota.

Reporting Command

An an extension to the count command of the HDFS shell reports quota values and the current count of names and bytes in use.

- + + fs -count -q <directory>...<directory> -
- With the -q option, also report the quota value set for each - directory, and the available quota remaining. If the directory does not have - a quota set, the reported values are none and inf. -

-q

none

inf

³⁰

- +

Hadoop Change Log

Release 0.19.0 - Unreleased

- Name Space Quotas Administrator Guide -

Directory Quotas Administrator's Guide

Name Quotas

Space Quotas

Administrative Commands

Reporting Command