Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 78125 invoked from network); 16 Apr 2008 17:37:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Apr 2008 17:37:55 -0000 Received: (qmail 40314 invoked by uid 500); 16 Apr 2008 17:37:56 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 40179 invoked by uid 500); 16 Apr 2008 17:37:55 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 40170 invoked by uid 99); 16 Apr 2008 17:37:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2008 10:37:55 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO eris.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2008 17:37:20 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 41CA31A9838; Wed, 16 Apr 2008 10:37:33 -0700 (PDT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r648767 [1/3] - in /hadoop/core/branches/branch-0.17/docs: ./ skin/images/ Date: Wed, 16 Apr 2008 17:37:32 -0000 To: core-commits@hadoop.apache.org From: acmurthy@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20080416173733.41CA31A9838@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: acmurthy Date: Wed Apr 16 10:37:17 2008 New Revision: 648767 URL: http://svn.apache.org/viewvc?rev=648767&view=rev Log: HADOOP-3162 related documentation changes to branch-0.17 Modified: hadoop/core/branches/branch-0.17/docs/changes.html hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html hadoop/core/branches/branch-0.17/docs/mapred_tutorial.pdf hadoop/core/branches/branch-0.17/docs/skin/images/rc-b-l-15-1body-2menu-3menu.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-b-r-15-1body-2menu-3menu.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-b-r-5-1header-2tab-selected-3tab-selected.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-l-5-1header-2searchbox-3searchbox.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-l-5-1header-2tab-selected-3tab-selected.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-l-5-1header-2tab-unselected-3tab-unselected.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-r-15-1body-2menu-3menu.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-r-5-1header-2searchbox-3searchbox.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-r-5-1header-2tab-selected-3tab-selected.png hadoop/core/branches/branch-0.17/docs/skin/images/rc-t-r-5-1header-2tab-unselected-3tab-unselected.png Modified: hadoop/core/branches/branch-0.17/docs/changes.html URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/docs/changes.html?rev=648767&r1=648766&r2=648767&view=diff ============================================================================== --- hadoop/core/branches/branch-0.17/docs/changes.html (original) +++ hadoop/core/branches/branch-0.17/docs/changes.html Wed Apr 16 10:37:17 2008 @@ -56,7 +56,7 @@
  • INCOMPATIBLE CHANGES -   (23) +   (24)
    1. HADOOP-2786. Move hbase out of hadoop core
    2. @@ -103,6 +103,8 @@ availability zone as the cluster. Ganglia monitoring and large instance sizes have also been added.
      (Chris K Wensel via tomwhite)
    3. HADOOP-2826. Deprecated FileSplit.getFile(), LineRecordReader.readLine().
      (Amareshwari Sriramadasu via ddas)
    4. +
    5. HADOOP-3239. getFileInfo() returns null for non-existing files instead +of throwing FileNotFoundException.
      (Lohit Vijayarenu via shv)
  • NEW FEATURES @@ -232,7 +234,7 @@
  • BUG FIXES -   (93) +   (99)
    1. HADOOP-2195. '-mkdir' behaviour is now closer to Linux shell in case of errors.
      (Mahadev Konar via rangadi)
    2. @@ -400,6 +402,18 @@
    3. HADOOP-1373. checkPath() should ignore case when it compares authoriy.
      (Edward J. Yoon via rangadi)
    4. HADOOP-3204. Fixes a problem to do with ReduceTask's LocalFSMerger not catching Throwable.
      (Amar Ramesh Kamat via ddas)
    5. +
    6. HADOOP-3229. Report progress when collecting records from the mapper and +the combiner.
      (Doug Cutting via cdouglas)
    7. +
    8. HADOOP-3225. Unwrapping methods of RemoteException should initialize +detailedMassage field.
      (Mahadev Konar, shv, cdouglas)
    9. +
    10. HADOOP-3247. Fix gridmix scripts to use the correct globbing syntax and +change maxentToSameCluster to run the correct number of jobs.
      (Runping Qi via cdouglas)
    11. +
    12. HADOOP-3242. Fix the RecordReader of SequenceFileAsBinaryInputFormat to +correctly read from the start of the split and not the beginning of the +file.
      (cdouglas via acmurthy)
    13. +
    14. HADOOP-3256. Encodes the job name used in the filename for history files.
      (Arun Murthy via ddas)
    15. +
    16. HADOOP-3162. Ensure that comma-separated input paths are treated correctly +as multiple input paths.
      (Amareshwari Sri Ramadasu via acmurthy)
Modified: hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html?rev=648767&r1=648766&r2=648767&view=diff ============================================================================== --- hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html (original) +++ hadoop/core/branches/branch-0.17/docs/mapred_tutorial.html Wed Apr 16 10:37:17 2008 @@ -292,7 +292,7 @@ Example: WordCount v2.0 - +

Job Input

@@ -1727,7 +1730,7 @@ appropriate CompressionCodec. However, it must be noted that compressed files with the above extensions cannot be split and each compressed file is processed in its entirety by a single mapper.

- +

InputSplit

@@ -1741,7 +1744,7 @@ FileSplit is the default InputSplit. It sets map.input.file to the path of the input file for the logical split.

- +

RecordReader

@@ -1753,7 +1756,7 @@ for processing. RecordReader thus assumes the responsibility of processing record boundaries and presents the tasks with keys and values.

- +

Job Output

@@ -1778,7 +1781,7 @@

TextOutputFormat is the default OutputFormat.

- +

Task Side-Effect Files

In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.

@@ -1817,7 +1820,7 @@

The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.

- +

RecordWriter

@@ -1825,9 +1828,9 @@ pairs to an output file.

RecordWriter implementations write the job outputs to the FileSystem.

-
+

Other Useful Features

- +

Counters

Counters represent global counters, defined either by @@ -1841,7 +1844,7 @@ Reporter.incrCounter(Enum, long) in the map and/or reduce methods. These counters are then globally aggregated by the framework.

- +

DistributedCache

@@ -1874,7 +1877,7 @@ DistributedCache.createSymlink(Configuration) api. Files have execution permissions set.

- +

Tool

The Tool interface supports the handling of generic Hadoop command-line options. @@ -1914,7 +1917,7 @@

- +

IsolationRunner

@@ -1938,7 +1941,7 @@

IsolationRunner will run the failed task in a single jvm, which can be in the debugger, over precisely the same input.

- +

Debugging

Map/Reduce framework provides a facility to run user-provided scripts for debugging. When map/reduce task fails, user can run @@ -1949,7 +1952,7 @@

In the following sections we discuss how to submit debug script along with the job. For submitting debug script, first it has to distributed. Then the script has to supplied in Configuration.

- +
How to distribute script file:

To distribute the debug script file, first copy the file to the dfs. @@ -1972,7 +1975,7 @@ DistributedCache.createSymLink(Configuration) api.

- +
How to submit script:

A quick way to submit debug script is to set values for the properties "mapred.map.task.debug.script" and @@ -1996,17 +1999,17 @@ $script $stdout $stderr $syslog $jobconf $program

- +
Default Behavior:

For pipes, a default script is run to process core dumps under gdb, prints stack trace and gives info about running threads.

- +

JobControl

JobControl is a utility which encapsulates a set of Map-Reduce jobs and their dependencies.

- +

Data Compression

Hadoop Map-Reduce provides facilities for the application-writer to specify compression for both intermediate map-outputs and the @@ -2020,7 +2023,7 @@ codecs for reasons of both performance (zlib) and non-availability of Java libraries (lzo). More details on their usage and availability are available here.

- +
Intermediate Outputs

Applications can control compression of intermediate map-outputs via the @@ -2041,7 +2044,7 @@ JobConf.setMapOutputCompressionType(SequenceFile.CompressionType) api.

- +
Job Outputs

Applications can control compression of job-outputs via the @@ -2061,7 +2064,7 @@ - +

Example: WordCount v2.0

Here is a more complete WordCount which uses many of the @@ -2071,7 +2074,7 @@ pseudo-distributed or fully-distributed Hadoop installation.

- +

Source Code

@@ -3160,7 +3163,7 @@ @@ -3281,7 +3284,7 @@
111.      - conf.setInputPath(new Path(other_args.get(0))); + FileInputFormat.setInputPaths(conf, new Path(other_args.get(0)));
- +

Sample Runs

Sample text-files as input:

@@ -3449,7 +3452,7 @@

- +

Highlights

The second version of WordCount improves upon the previous one by using some features offered by the Map-Reduce framework: