Return-Path: Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: (qmail 50391 invoked from network); 7 Mar 2011 23:07:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Mar 2011 23:07:21 -0000 Received: (qmail 57948 invoked by uid 500); 7 Mar 2011 23:07:21 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 57812 invoked by uid 500); 7 Mar 2011 23:07:21 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 57676 invoked by uid 99); 7 Mar 2011 23:07:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 23:07:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 23:07:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8F0B139C243 for ; Mon, 7 Mar 2011 23:06:59 +0000 (UTC) Date: Mon, 7 Mar 2011 23:06:59 +0000 (UTC) From: "Alexis (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <1948642827.2946.1299539219582.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <23051049.281791293074462402.JavaMail.jira@thor> Subject: [jira] Resolved: (MAPREDUCE-2229) Initialize reader in Sort example MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis resolved MAPREDUCE-2229. ------------------------------- Resolution: Duplicate Fix Version/s: 0.22.0 > Initialize reader in Sort example > --------------------------------- > > Key: MAPREDUCE-2229 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples > Affects Versions: 0.21.0 > Reporter: Alexis > Fix For: 0.22.0 > > > As described in paragraph "Total Sort" in HTDG book, page 223, I tried to create a Hadoop job to sort globally some input, using InputSampler with TotalOrderPartitioner. > Please run the mapreduce Sort example with the following arguments to reproduce the exception. > {noformat} > org.apache.hadoop.examples.Sort > -r 2 > -outKey org.apache.hadoop.io.Text > -outValue org.apache.hadoop.io.Text > -inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat > -outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat > -totalOrder 0.1 10000 10 > test/sortInput > test/sortOutput > {noformat} > The issue is already described there: > - http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%3CDB1B07B75C01FB40B814678DEE6E0085175C86CDFF@bdc.taomee-ex.com%3E > - http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg01372.html > This is a somewhat related comment: > http://www.mail-archive.com/common-user@hadoop.apache.org/msg03947.html > We need to initialize the reader to avoid the NPE occuring when generating the partition file: > {noformat} > Exception in thread "main" java.lang.NullPointerException > at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) > at org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91) > at org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220) > at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315) > at org.apache.hadoop.examples.Sort.run(Sort.java:166) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > at org.apache.hadoop.examples.Sort.main(Sort.java:192) > {noformat} > Right now, this initialization only happens in runNewMapper in org.apache.hadoop.mapred.MapTask, but the sampling is performed before the job started. TeraInputFormat class for the TeraSort has its own writePartitionFile method. This is the javadoc comment of createRecordReader method in InputFormat class: > {noformat} > * Create a record reader for a given split. The framework will call > * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before > * the split is used. > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira