Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DBFB1015E for ; Mon, 24 Nov 2014 09:29:55 +0000 (UTC) Received: (qmail 2894 invoked by uid 500); 24 Nov 2014 09:29:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 2787 invoked by uid 500); 24 Nov 2014 09:29:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 2777 invoked by uid 99); 24 Nov 2014 09:29:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Nov 2014 09:29:50 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [122.98.14.32] (HELO kecgate02.infosys.com) (122.98.14.32) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Nov 2014 09:29:45 +0000 X-TM-IMSS-Message-ID: <238456ea000542f4@infosys.com> Received: from BLRKECHUB14.ad.infosys.com ([10.66.236.41]) by infosys.com ([122.98.14.32]) with ESMTP (TREND IMSS SMTP Service 7.1) id 238456ea000542f4 ; Mon, 24 Nov 2014 14:54:26 +0530 Received: from BLRKECHUB08.ad.infosys.com (10.66.236.138) by BLRKECHUB14.ad.infosys.com (10.66.236.41) with Microsoft SMTP Server (TLS) id 14.3.123.3; Mon, 24 Nov 2014 14:58:38 +0530 Received: from PRCSGIHCS03.ad.infosys.com (10.158.158.43) by BLRKECHUB08.ad.infosys.com (10.66.236.138) with Microsoft SMTP Server (TLS) id 14.3.123.3; Mon, 24 Nov 2014 14:58:37 +0530 Received: from PRCSGIMBX11.ad.infosys.com ([fe80::3ced:bc23:b2c8:da03]) by PRCSGIHCS03.ad.infosys.com ([::1]) with mapi id 14.03.0123.003; Mon, 24 Nov 2014 17:28:35 +0800 From: Gino Gu01 To: "user@hadoop.apache.org" Subject: An issue in MapReduce Tutorial Thread-Topic: An issue in MapReduce Tutorial Thread-Index: AdAHyB/hK0dECXQ2QyWqJcDOsabjeg== Date: Mon, 24 Nov 2014 09:28:34 +0000 Message-ID: <5A819A5998DDA542AB3436F57CD47B33481E4D76@PRCSGIMBX11.ad.infosys.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.158.139.30] Content-Type: multipart/alternative; boundary="_000_5A819A5998DDA542AB3436F57CD47B33481E4D76PRCSGIMBX11adin_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_5A819A5998DDA542AB3436F57CD47B33481E4D76PRCSGIMBX11adin_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable There is one bug in WordCount v2.0 which is part of MapReduce Tutorial. How to reproduce: Run the application: $ bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wo= rdcount/output It will throw Null Pointer Exception during map phase. Reason: Below highlighted line set the default value of wordcount.skip.patterns t= o true. But in the arguments we didn't pass the any patterns file, so the line fo= r (URI patternsURI : patternsURIs) throws exception. public void setup(Context context) throws IOException, InterruptedException { conf =3D context.getConfiguration(); caseSensitive =3D conf.getBoolean("wordcount.case.sensitive", true)= ; if (conf.getBoolean("wordcount.skip.patterns", true)) { URI[] patternsURIs =3D Job.getInstance(conf).getCacheFiles(); for (URI patternsURI : patternsURIs) { Path patternsPath =3D new Path(patternsURI.getPath()); String patternsFileName =3D patternsPath.getName().toString(); parseSkipFile(patternsFileName); } } } How to fix it: Change above highlighted line to conf.getBoolean("wordcount.skip.patterns", false)) **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended sol= ely for the use of the addressee(s). If you are not the intended recipient, p= lease notify the sender by e-mail and delete the original message. Further, you= are not to copy, disclose, or distribute this e-mail or its contents to any other= person and any such actions are unlawful. This e-mail may contain viruses. Infosys h= as taken every reasonable precaution to minimize this risk, but is not liable for = any damage you may sustain as a result of any virus in this e-mail. You should carry= out your own virus checks before opening the e-mail or attachment. Infosys reserve= s the right to monitor and review the content of all messages sent to or from t= his e-mail address. Messages sent to or from this e-mail address may be stored on th= e Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS*** --_000_5A819A5998DDA542AB3436F57CD47B33481E4D76PRCSGIMBX11adin_ Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable

There is one bug in WordCount v2.0 which is part o= f MapReduce Tutorial.

How to reproduce:

Run the application:

$ bin/hadoop jar wc.jar WordCount= 2 /user/joe/wordcount/input /user/joe/wordcount/output

It will throw Null Pointer Exception during map ph= ase.

 

Reason:

Below highlighted line set the default value of= wordcount.skip.patterns to true.

But in the arguments we didn’t pass the any = patterns file, so the line fo= r (URI patternsURI : patternsURIs) throws exception.

    public void setup(Context context) throws IOException,

    &nb= sp;   InterruptedException {

    &nb= sp; conf =3D context.getConfiguration();

    &nb= sp; caseSensitive =3D conf.getBoolean("wordcount.case.sensitive", true);

    &nb= sp; if (conf.getBoolean("wordcount.skip.patterns", true)) {

    &nb= sp;   URI[] patternsURIs =3D Job.getInstance(conf).get= CacheFiles();

    &nb= sp;   for (URI patternsURI : patternsURIs) {

    &nb= sp;     Path patternsPath =3D new Path(patternsURI.getPath());

    &nb= sp;     String patternsFileName =3D patternsPath.getN= ame().toString();

    &nb= sp;     parseSkipFile(patternsFileName);

    &nb= sp;   }

    &nb= sp; }

}

 

How to fix it:

Change above highlighted line to

conf.getBoolean("wordcount.skip.patterns", false))

 

=
*************=
*** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended sol=
ely 
for the use of the addressee(s). If you are not the intended recipient, p=
lease 
notify the sender by e-mail and delete the original message. Further, you=
 are not 
to copy, disclose, or distribute this e-mail or its contents to any other=
 person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys h=
as taken 
every reasonable precaution to minimize this risk, but is not liable for =
any damage 
you may sustain as a result of any virus in this e-mail. You should carry=
 out your 
own virus checks before opening the e-mail or attachment. Infosys reserve=
s the 
right to monitor and review the content of all messages sent to or from t=
his e-mail 
address. Messages sent to or from this e-mail address may be stored on th=
e 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
--_000_5A819A5998DDA542AB3436F57CD47B33481E4D76PRCSGIMBX11adin_--