Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3550E41A for ; Sat, 25 May 2013 03:33:39 +0000 (UTC) Received: (qmail 9964 invoked by uid 500); 25 May 2013 03:33:34 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 9543 invoked by uid 500); 25 May 2013 03:33:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 9474 invoked by uid 99); 25 May 2013 03:33:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 May 2013 03:33:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.212.54] (HELO mail-vb0-f54.google.com) (209.85.212.54) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 May 2013 03:33:19 +0000 Received: by mail-vb0-f54.google.com with SMTP id f13so3587626vbg.13 for ; Fri, 24 May 2013 20:32:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=okuaAeh6iQBOuz17OD2HTwFBojpZWt7nmLsujv5NVRc=; b=IqgDZi8ZSLbifgGgWud08IogRxAmPI2aJuyG3dOo+Bz58gwnwPtocSiSw7ZXFvIY8F oegCVv/7obEUojgaKfRcD55U2Fo4bKkKGhwx5ErR+te/3CW56dNeiT1hTEx/COs0iVXU wi2sQ7KaIKGT8WoTv5o0FlCLog0d+tFCuDaNh7jXVZWOwS8bo1KoymiOBWmnGwmLA7Yw BVZy1WNQtnvaSgQ8vnLPwMEqDmNFlNhGaE45v4jQVqJP8Ldp1lu8ACYeuETZP4BGfva7 BOLJ49XMH5GHJaiVu6bl6mX737NHEmQxg/oDnOcjXcvkBLOE64H6IujxZZ9iKNrTsRM7 4cPw== X-Received: by 10.220.223.202 with SMTP id il10mr10464794vcb.4.1369452758818; Fri, 24 May 2013 20:32:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.165.45 with HTTP; Fri, 24 May 2013 20:32:17 -0700 (PDT) In-Reply-To: References: From: Jean-Marc Spaggiari Date: Fri, 24 May 2013 23:32:17 -0400 Message-ID: Subject: Re: Child Error To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae9cdc4873a0be404dd829130 X-Gm-Message-State: ALoCoQm+GplAbmKD6TZB1sLqbm86fDF0syhpgX6tUaOgLCQaszRBMjKurSqjPQBOL6L6oiIAAmX9 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9cdc4873a0be404dd829130 Content-Type: text/plain; charset=UTF-8 Hi Jim, Which JVM are you using? I don't think you have any memory issue. Else you will have got some OOME... JM 2013/5/24 Jim Twensky > Hi again, in addition to my previous post, I was able to get some error > logs from the task tracker/data node this morning and looks like it might > be a jetty issue: > > 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to > retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 > java.io.IOException: Owner 'jim' for path > /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout > did not match expected owner '10929' > at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) > at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) > at org.apache.hadoop.mapred.TaskLog$Reader.(TaskLog.java:455) > at > org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) > at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > > I am wondering if I am hitting MAPREDUCE-2389If so, how do I downgrade my jetty version? Should I just replace the jetty > jar file in the lib directory with an earlier version and restart my > cluster? > > Thank you. > > > > > On Thu, May 23, 2013 at 7:14 PM, Jim Twensky wrote: > >> Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and >> an 8-core processor. I sometimes get the following error on a random basis: >> >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> Exception in thread "main" java.io.IOException: Exception reading file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken >> at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) >> at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) >> at org.apache.hadoop.mapred.Child.main(Child.java:92) >> Caused by: java.io.IOException: failure to login >> at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) >> at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) >> at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >> at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) >> ... 2 more >> Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name >> at com.sun.security.auth.UnixPrincipal.(UnixPrincipal.java:70) >> at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> ...... >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> This does not always happen but I see a pattern when the intermediate >> data is larger, it tends to occur more frequently. In the web log, I can >> see the following: >> >> java.lang.Throwable: Child Error >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >> Caused by: java.io.IOException: Task process exit with nonzero status of 1. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >> >> From what I read online, a possible cause is when there is not enough >> memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for >> each child and the maximum number of map and reduce tasks are set to 3 - So >> 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker >> (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I >> couldn't avoid it so far. >> In case it helps, here are the relevant sections of my mapred-site.xml >> >> >> ----------------------------------------------------------------------------------------------------------- >> >> mapred.tasktracker.map.tasks.maximum >> 3 >> >> mapred.tasktracker.reduce.tasks.maximum >> 3 >> >> mapred.child.java.opts >> -Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/var/tmp/soner >> >> mapred.reduce.parallel.copies >> 5 >> >> tasktracker.http.threads >> 80 >> >> ----------------------------------------------------------------------------------------------------------- >> >> My jobs still complete most of the time though they occasionally fail and >> I'm really puzzled at this point. I'd appreciate any help or ideas. >> >> Thanks >> > > --14dae9cdc4873a0be404dd829130 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Jim,

Which JVM are you using?

I don't think you have a= ny memory issue. Else you will have got some OOME...

JM

2013/5/24 Jim Twensky <jim.twensky@gmail.com>
Hi again, in addi= tion to my previous post, I was able to get some error logs from the task t= racker/data node this morning and looks like it might be a jetty issue:

2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed t= o retrieve stdout log for task: attempt_201305231647_0007_m_001096_0
java.io.IOException: Owner 'jim' for path /var/tmp/jim/hadoop-logs/= userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout = did not match expected owner '10929'
=C2=A0 at org.apache.hadoop= .io.SecureIOUtils.checkStat(SecureIOUtils.java:177)
=C2=A0 at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java= :117)
=C2=A0 at org.apache.hadoop.mapred.TaskLog$Reader.<init>(Tas= kLog.java:455)
=C2=A0 at org.apache.hadoop.mapred.TaskLogServlet.printTa= skLog(TaskLogServlet.java:81)
=C2=A0 at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java= :296)
=C2=A0 at javax.servlet.http.HttpServlet.service(HttpServlet.java:= 707)
=C2=A0 at javax.servlet.http.HttpServlet.service(HttpServlet.java:8= 20)
=C2=A0 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHold= er.java:511)
=C2=A0 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Ser= vletHandler.java:1221)
=C2=A0 at org.apache.hadoop.http.HttpServer$Quoti= ngInputFilter.doFilter(HttpServer.java:848)
=C2=A0 at org.mortbay.jetty.= servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
=C2=A0 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.ja= va:399)


I am wondering if I am hitting
MAPREDUCE-2= 389 If so, how do I downgrade my jetty version? Should I just replace t= he jetty jar file in the lib directory with an earlier version and restart = my cluster?

Thank you.




On Thu, May 23, 2013 at 7:14 PM, Jim Twensky = <jim.twensky@= gmail.com> wrote:
Hello, I hav= e a 20 node Hadoop cluster where each node has 8GB memory and an 8-core pro= cessor. I sometimes get the following error on a random basis:


-------------------------------------------------------------------= ----------------------------------------
Exception in thread "main" java.io.IOException: Exception re=
ading file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/jo=
b_201305231647_0005/jobToken
	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials=
.java:135)
	at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.j=
ava:165)
	at org.apache.hadoop.mapred.Child.main(Child.java:92)
Caused by: java.io.IOException: failure to login
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupI=
nformation.java:501)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGrou=
pInformation.java:463)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:=
1519)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials=
.java:129)
	... 2 more
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerE=
xception: invalid null input: name
	at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:70)
	at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java=
:132)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja=
va:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso=
rImpl.java:43)

......
-------------------------------------= ----------------------------------------------------------------------

This does not always happen but I see a pattern when the intermed= iate data is larger, it tends to occur more frequently. In the web log, I c= an see the following:

java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

From what I read online, a possible cause is when there is not enough me= mory for all JVM's. My mapred site.xml is set up to allocate 1100MB for= each child and the maximum number of map and reduce tasks are set to 3 - S= o 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker= (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but = I couldn't avoid it so far.
In case it helps, here are the relevant sections of my mapred-site.xm= l

------------------------------------------------------------------= -----------------------------------------

=C2=A0=C2=A0=C2=A0 <nam= e>mapred.tasktracker.map.tasks.maximum</name>
=C2=A0=C2=A0=C2=A0 <value>3</value>

=C2=A0=C2=A0=C2=A0 &= lt;name>mapred.tasktracker.reduce.tasks.maximum</name>
=C2=A0= =C2=A0=C2=A0 <value>3</value>

=C2=A0=C2=A0=C2=A0 <nam= e>mapred.child.java.opts</name>
=C2=A0=C2=A0=C2=A0 <value>= ;-Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=3D/var/tmp/= soner</value>

=C2=A0=C2=A0=C2=A0 <name>mapred.reduce.parallel.copies</name&g= t;
=C2=A0=C2=A0=C2=A0 <value>5</value>

=C2=A0=C2=A0= =C2=A0 <name>tasktracker.http.threads</name>
=C2=A0=C2=A0=C2= =A0 <value>80</value>
--------------------------------------= ---------------------------------------------------------------------

My jobs still complete most of the time though they occasionally = fail and I'm really puzzled at this point. I'd appreciate any help = or ideas.

Thanks


--14dae9cdc4873a0be404dd829130--