Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 16528 invoked from network); 20 Sep 2008 00:30:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Sep 2008 00:30:43 -0000 Received: (qmail 66111 invoked by uid 500); 20 Sep 2008 00:30:40 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 66075 invoked by uid 500); 20 Sep 2008 00:30:40 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 66066 invoked by uid 99); 20 Sep 2008 00:30:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Sep 2008 17:30:39 -0700 X-ASF-Spam-Status: No, hits=-1997.4 required=10.0 tests=ALL_TRUSTED,OBSCURED_EMAIL,URI_NOVOWEL X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Sep 2008 00:29:40 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 53608238889D; Fri, 19 Sep 2008 17:30:13 -0700 (PDT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r697299 - in /hadoop/core/trunk: CHANGES.txt docs/changes.html docs/hadoop-default.html docs/hdfs_quota_admin_guide.html docs/hdfs_quota_admin_guide.pdf Date: Sat, 20 Sep 2008 00:30:12 -0000 To: core-commits@hadoop.apache.org From: nigel@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20080920003013.53608238889D@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: nigel Date: Fri Sep 19 17:30:12 2008 New Revision: 697299 URL: http://svn.apache.org/viewvc?rev=697299&view=rev Log: Preparing for release 0.19.0 Modified: hadoop/core/trunk/CHANGES.txt hadoop/core/trunk/docs/changes.html hadoop/core/trunk/docs/hadoop-default.html hadoop/core/trunk/docs/hdfs_quota_admin_guide.html hadoop/core/trunk/docs/hdfs_quota_admin_guide.pdf Modified: hadoop/core/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=697299&r1=697298&r2=697299&view=diff ============================================================================== --- hadoop/core/trunk/CHANGES.txt (original) +++ hadoop/core/trunk/CHANGES.txt Fri Sep 19 17:30:12 2008 @@ -1,6 +1,6 @@ Hadoop Change Log -Trunk (unreleased changes) +Release 0.19.0 - Unreleased INCOMPATIBLE CHANGES Modified: hadoop/core/trunk/docs/changes.html URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/changes.html?rev=697299&r1=697298&r2=697299&view=diff ============================================================================== --- hadoop/core/trunk/docs/changes.html (original) +++ hadoop/core/trunk/docs/changes.html Fri Sep 19 17:30:12 2008 @@ -36,7 +36,7 @@ function collapse() { for (var i = 0; i < document.getElementsByTagName("ul").length; i++) { var list = document.getElementsByTagName("ul")[i]; - if (list.id != 'trunk_(unreleased_changes)_' && list.id != 'release_0.18.1_-_2008-09-17_') { + if (list.id != 'release_0.19.0_-_unreleased_' && list.id != 'release_0.18.1_-_2008-09-17_') { list.style.display = "none"; } } @@ -52,12 +52,12 @@ Hadoop

Hadoop Change Log

-

Trunk (unreleased changes) +

Release 0.19.0 - Unreleased

-
    -
  • INCOMPATIBLE CHANGES -   (16) -
      +
        +
      • INCOMPATIBLE CHANGES +   (18) +
        1. HADOOP-3595. Remove deprecated methods for mapred.combine.once functionality, which was necessary to providing backwards compatible combiner semantics for 0.18.
          (cdouglas via omalley)
        2. @@ -103,11 +103,18 @@
        3. HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
          (Sanjay Radia via hairong)
        4. HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool interface and GenericOptionsParser.
          (Enis Soztutar via acmurthy)
        5. +
        6. HADOOP-2816. Cluster summary at name node web reports the space +utilization as: +Configured Capacity: capacity of all the data directories - Reserved space +Present Capacity: Space available for dfs,i.e. remaining+used space +DFS Used%: DFS used space/Present Capacity
          (Suresh Srinivas via hairong)
        7. +
        8. HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace +quotas in 0.18.
          (rangadi)
      • -
      • NEW FEATURES -   (31) -
          +
        1. NEW FEATURES +   (39) +
          1. HADOOP-3341. Allow streaming jobs to specify the field separator for map and reduce input and output. The new configuration values are: stream.map.input.field.separator @@ -173,12 +180,30 @@ directory.
            (hairong via szetszwo)
          2. HADOOP-3981. Implement a distributed file checksum algorithm in HDFS and change DistCp to use file checksum for comparing src and dst files
            (szetszwo)
          3. +
          4. HADOOP-3829. Narrown down skipped records based on user acceptable value.
            (Sharad Agarwal via ddas)
          5. +
          6. HADOOP-3930. Add common interfaces for the pluggable schedulers and the +cli & gui clients.
            (Sreekanth Ramakrishnan via omalley)
          7. +
          8. HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem.
            (szetszwo)
          9. +
          10. HADOOP-249. Reuse JVMs across Map-Reduce Tasks. +Configuration changes to hadoop-default.xml: + add mapred.job.reuse.jvm.num.tasks
            (Devaraj Das via acmurthy)
          11. +
          12. HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the +query language.
            (tomwhite)
          13. +
          14. HADOOP-2536. Implement a JDBC based database input and output formats to +allow Map-Reduce applications to work with databases.
            (Fredrik Hedberg and +Enis Soztutar via acmurthy)
          15. +
          16. HADOOP-3019. A new library to support total order partitions.
            (cdouglas via omalley)
          17. +
          18. HADOOP-3924. Added a 'KILLED' job status.
            (Subramaniam Krishnan via +acmurthy)
        2. -
        3. IMPROVEMENTS -   (55) -
            -
          1. HADOOP-3908. Fuse-dfs: better error message if llibhdfs.so doesn't exist.
            (Pete Wyckoff through zshao)
          2. +
          3. IMPROVEMENTS +   (68) +
              +
            1. HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
              (zshao)
            2. +
            3. HADOOP-4106. libhdfs: add time, permission and user attribute support (part 2).
              (Pete Wyckoff through zshao)
            4. +
            5. HADOOP-4104. libhdfs: add time, permission and user attribute support.
              (Pete Wyckoff through zshao)
            6. +
            7. HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
              (Pete Wyckoff through zshao)
            8. HADOOP-3732. Delay intialization of datanode block verification till the verification thread is started.
              (rangadi)
            9. HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
              (rangadi)
            10. @@ -278,11 +303,29 @@ omalley)
            11. HADOOP-4184. Break the module dependencies between core, hdfs, and mapred.
              (tomwhite via omalley)
            12. +
            13. HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
              (Ramya R via nigel)
            14. +
            15. HADOOP-4117. Improve configurability of Hadoop EC2 instances.
              (tomwhite)
            16. +
            17. HADOOP-2411. Add support for larger CPU EC2 instance types.
              (Chris K Wensel via tomwhite)
            18. +
            19. HADOOP-4083. Changed the configuration attribute queue.name to +mapred.job.queue.name.
              (Hemanth Yamijala via acmurthy)
            20. +
            21. HADOOP-4194. Added the JobConf and JobID to job-related methods in +JobTrackerInstrumentation for better metrics.
              (Mac Yang via acmurthy)
            22. +
            23. HADOOP-3975. Change test-patch script to report working the dir +modifications preventing the suite from being run.
              (Ramya R via cdouglas)
            24. +
            25. HADOOP-4124. Added a command-line switch to allow users to set job +priorities, also allow it to be manipulated via the web-ui.
              (Hemanth +Yamijala via acmurthy)
            26. +
            27. HADOOP-2165. Augmented JobHistory to include the URIs to the tasks' +userlogs.
              (Vinod Kumar Vavilapalli via acmurthy)
            28. +
            29. HADOOP-4062. Remove the synchronization on the output stream when a +connection is closed and also remove an undesirable exception when +a client is stoped while there is no pending RPC request.
              (hairong)
            30. +
            31. HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
              (szetszwo)
          4. -
          5. OPTIMIZATIONS -   (8) -
              +
            1. OPTIMIZATIONS +   (9) +
              1. HADOOP-3556. Removed lock contention in MD5Hash by changing the singleton MessageDigester by an instance per Thread using ThreadLocal.
                (Iv?n de Prado via omalley)
              2. @@ -300,11 +343,12 @@ GenericMRLoadGenerator public, so they can be used in other contexts.
                (Lingyun Yang via omalley)
              3. HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading it from a different .crc file.
                (Jothi Padmanabhan via ddas)
              4. +
              5. HADOOP-3638. Caches the iFile index files in memory to reduce seeks
                (Jothi Padmanabhan via ddas)
            2. -
            3. BUG FIXES -   (79) -
                +
              1. BUG FIXES +   (88) +
                1. HADOOP-3563. Refactor the distributed upgrade code so that it is easier to identify datanode and namenode related code.
                  (dhruba)
                2. HADOOP-3640. Fix the read method in the NativeS3InputStream.
                  (tomwhite via @@ -452,6 +496,21 @@
                3. HADOOP-4138. Refactor the Hive SerDe library to better structure the interfaces to the serializer and de-serializer.
                  (Zheng Shao via dhruba)
                4. HADOOP-4195. Close compressor before returning to codec pool.
                  (acmurthy via omalley)
                5. +
                6. HADOOP-2403. Escapes some special characters before logging to +history files.
                  (Amareshwari Sriramadasu via ddas)
                7. +
                8. HADOOP-4200. Fix a bug in the test-patch.sh script.
                  (Ramya R via nigel)
                9. +
                10. HADOOP-4084. Add explain plan capabilities to Hive Query Language.
                  (Ashish Thusoo via dhruba)
                11. +
                12. HADOOP-4121. Preserve cause for exception if the initialization of +HistoryViewer for JobHistory fails.
                  (Amareshwari Sri Ramadasu via +acmurthy)
                13. +
                14. HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
                  (Sreekanth Ramakrishnan via ddas)
                15. +
                16. HADOOP-4077. Setting access and modification time for a file +requires write permissions on the file.
                  (dhruba)
                17. +
                18. HADOOP-3592. Fix a couple of possible file leaks in FileUtil
                  (Bill de hOra via rangadi)
                19. +
                20. HADOOP-4120. Hive interactive shell records the time taken by a +query.
                  (Raghotham Murthy via dhruba)
                21. +
                22. HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME +and then the path.
                  (Raghotham Murthy via dhruba)
      Modified: hadoop/core/trunk/docs/hadoop-default.html URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hadoop-default.html?rev=697299&r1=697298&r2=697299&view=diff ============================================================================== --- hadoop/core/trunk/docs/hadoop-default.html (original) +++ hadoop/core/trunk/docs/hadoop-default.html Fri Sep 19 17:30:12 2008 @@ -600,6 +600,11 @@ may be executed in parallel. +mapred.job.reuse.jvm.num.tasks1How many tasks to run per jvm. If set to -1, there is + no limit. + + + mapred.min.split.size0The minimum size chunk that map input should be split into. Note that some file formats may have minimum split sizes that take priority over this setting. @@ -749,12 +754,6 @@ -mapred.skip.mode.enabledfalse Indicates whether skipping of bad records is enabled or not. - If enabled the framework will try to find bad records and skip - them on further attempts. - - - mapred.skip.attempts.to.start.skipping2 The number of Task attempts AFTER which skip mode will be kicked off. When skip mode is kicked off, the tasks reports the range of records which it will process @@ -765,7 +764,7 @@ mapred.skip.map.auto.incr.proc.counttrue The flag which if set to true, - Counters.Application.MAP_PROCESSED_RECORDS is incremented + SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS is incremented by MapRunner after invoking the map function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. @@ -774,7 +773,7 @@ mapred.skip.reduce.auto.incr.proc.counttrue The flag which if set to true, - Counters.Application.REDUCE_PROCESSED_RECORDS is incremented + SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS is incremented by framework after invoking the reduce function. This value must be set to false for applications which process the records asynchronously or buffer the input records. For example streaming. @@ -782,6 +781,36 @@ +mapred.skip.out.dir If no value is specified here, the skipped records are + written to the output directory at _logs/skip. + User can stop writing skipped records by giving the value "none". + + + +mapred.skip.map.max.skip.records0 The number of acceptable skip records surrounding the bad + record PER bad record in mapper. The number includes the bad record as well. + To turn the feature of detection/skipping of bad records off, set the + value to 0. + The framework tries to narrow down the skipped range by retrying + until this threshold is met OR all attempts get exhausted for this task. + Set the value to Long.MAX_VALUE to indicate that framework need not try to + narrow down. Whatever records(depends on application) get skipped are + acceptable. + + + +mapred.skip.reduce.max.skip.groups0 The number of acceptable skip groups surrounding the bad + group PER bad group in reducer. The number includes the bad group as well. + To turn the feature of detection/skipping of bad groups off, set the + value to 0. + The framework tries to narrow down the skipped range by retrying + until this threshold is met OR all attempts get exhausted for this task. + Set the value to Long.MAX_VALUE to indicate that framework need not try to + narrow down. Whatever groups(depends on application) get skipped are + acceptable. + + + ipc.client.idlethreshold4000Defines the threshold number of connections after which connections will be inspected for idleness. @@ -942,13 +971,18 @@ -queue.namedefault Queue to which a job is submitted. This must match one of the +mapred.job.queue.namedefault Queue to which a job is submitted. This must match one of the queues defined in mapred.queue.names for the system. Also, the ACL setup for the queue must allow the current user to submit a job to the queue. Before specifying a queue, ensure that the system is configured with the queue, and access is allowed for submitting jobs to the queue. + +mapred.tasktracker.indexcache.mb10 The maximum memory that a task tracker allows for the + index cache that is used when serving map outputs to reducers. + + Modified: hadoop/core/trunk/docs/hdfs_quota_admin_guide.html URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hdfs_quota_admin_guide.html?rev=697299&r1=697298&r2=697299&view=diff ============================================================================== --- hadoop/core/trunk/docs/hdfs_quota_admin_guide.html (original) +++ hadoop/core/trunk/docs/hdfs_quota_admin_guide.html Fri Sep 19 17:30:12 2008 @@ -5,9 +5,7 @@ - - Name Space Quotas Administrator Guide - + Directory Quotas Administrator's Guide @@ -192,77 +190,124 @@ PDF -icon
      PDF
      -

      - Name Space Quotas Administrator Guide -

      - -

      - The Hadoop Distributed File System (HDFS) allows the administrator to set quotas on individual directories. - Newly created directories have no associated quota. - The largest quota is Long.Max_Value. A quota of one forces a directory - to remain empty. -

      - - -

      - The directory quota is a hard limit on the number of names in the tree - rooted at that directory. File and directory creations fault if the quota - would be exceeded. Quotas stick to renamed directories; the rename - operation faults if operation would result in a quota violation. - The attempt to set a quota faults if the directory would be in violation - of the new quota. -

      - - -

      - Quotas are persistent with the fsimage. When starting, if the fsimage - is immediately in violation of a quota (perhaps the fsimage was - surreptitiously modified), the startup operation fails with an error report. - Setting or removing a quota creates a journal entry. -

      - - -

      - The following new commands or new options are added to support quotas. - The first two are administration commands. -

      +

      Directory Quotas Administrator's Guide

      + + + +

      The Hadoop Distributed File System (HDFS) allows the administrator to set quotas for the number of names used and the +amount of space used for individual directories. Name quotas and space quotas operate independently, but the administration and +implementation of the two types of quotas are closely parallel.

      + + + +

      Name Quotas

      +
      +

      The name quota is a hard limit on the number of file and directory names in the tree rooted at that directory. File and +directory creations fail if the quota would be exceeded. Quotas stick with renamed directories; the rename operation fails if +operation would result in a quota violation. The attempt to set a quota fails if the directory would be in violation of the new +quota. A newly created directory has no associated quota. The largest quota is Long.Max_Value. A quota of one +forces a directory to remain empty. (Yes, a directory counts against its own quota!)

      +

      Quotas are persistent with the fsimage. When starting, if the fsimage is immediately in +violation of a quota (perhaps the fsimage was surreptitiously modified), +a warning is printed for each of such violations. Setting or removing a quota creates a journal entry.

      +
      + + + +

      Space Quotas

      +
      +

      The space quota is a hard limit on the number of bytes used by files in the tree rooted at that directory. Block +allocations fail if the quota would not allow a full block to be written. Each replica of a block counts against the quota. Quotas +stick with renamed directories; the rename operation fails if the operation would result in a quota violation. The attempt to +set a quota fails if the directory would be in violation of the new quota. A newly created directory has no associated quota. +The largest quota is Long.Max_Value. A quota of zero still permits files to be created, but no blocks can be added to the files. +Directories don't use host file system space and don't count against the space quota. The host file system space used to save +the file meta data is not counted against the quota. Quotas are charged at the intended replication factor for the file; +changing the replication factor for a file will credit or debit quotas.

      +

      Quotas are persistent with the fsimage. When starting, if the fsimage is immediately in +violation of a quota (perhaps the fsimage was surreptitiously modified), a warning is printed for +each of such violations. Setting or removing a quota creates a journal entry.

      +
      + - + +

      Administrative Commands

      +
      +

      Quotas are managed by a set of commands available only to the administrator.

        - -
      • - + + +
      • dfsadmin -setquota <N> <directory>...<directory> - -
        - Set the quota to be N for each directory. Best effort for each directory, - with faults reported if N is not a positive long integer, - the directory does not exist or it is a file, or the directory would - immediately exceed the new quota. -
      • - - -
      • - +
        Set the name quota to be N for +each directory. Best effort for each directory, with faults reported if N is not a positive long integer, the +directory does not exist or it is a file, or the directory would immediately exceed the new quota.
      • + + +
      • dfsadmin -clrquota <directory>...<director> -
        - Remove any quota for each directory. Best effort for each directory, - with faults reported if the directory does not exist or it is a file. - It is not a fault if the directory has no quota. -
      • - - +
        Remove any name quota for each directory. Best +effort for each directory, with faults reported if the directory does not exist or it is a file. It is not a fault if the +directory has no quota. + + +
      • +dfsadmin -setspacequota <N> <directory>...<directory> +
        Set the space quota to be +N×230 bytes (GB) for each directory. Best effort for each directory, with faults reported if N is +neither zero nor a positive integer, the directory does not exist or it is a file, or the directory would immediately exceed +the new quota.
      • + + +
      • +dfsadmin -clrspacequota <directory>...<director> +
        Remove any space quota for each directory. Best +effort for each directory, with faults reported if the directory does not exist or it is a file. It is not a fault if the +directory has no quota.
      • + + +
      +
      + + + +

      Reporting Command

      +
      +

      An an extension to the count command of the HDFS shell reports quota values and the current count of names and bytes in use.

      +
        + +
      • - + + fs -count -q <directory>...<directory> -
        - With the -q option, also report the quota value set for each - directory, and the available quota remaining. If the directory does not have - a quota set, the reported values are none and inf. -
      • - +
        With the -q option, also report the name quota +value set for each directory, the available name quota remaining, the space quota value set, and the available space quota +remaining. If the directory does not have a quota set, the reported values are none and inf. Space +values are rounded to multiples of 230 bytes (GB). + + + +
      - +
      + +