Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE674D657 for ; Thu, 13 Sep 2012 17:40:00 +0000 (UTC) Received: (qmail 64307 invoked by uid 500); 13 Sep 2012 17:39:55 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 64035 invoked by uid 500); 13 Sep 2012 17:39:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 64028 invoked by uid 99); 13 Sep 2012 17:39:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 17:39:55 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 17:39:49 +0000 Received: by vbme21 with SMTP id e21so2884523vbm.35 for ; Thu, 13 Sep 2012 10:39:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=BAeFozhRr0d1arPci0QJsbGiFvI4S4SXC2GaOTsmmLo=; b=O6UV9LZ8CRu1194TV7A4yFjVhopZeMQ0vDn1PPJxgfxhi5uAwU23CG0vIkQ+9XwSDN cfWtXX2KaBV+QgQNxWbucBcu3UPwJ0Nm4cZKhmUUom4BFD6oJy9laF4o8UEtiCAcgfuS AfXG9TChvhnTE6br5A9ohuzoSHUovu2SlmiQyyJzljm0SCbN79UBmkYnTQGB5G1CFHoy RpWdmqpoZ+a66z/trq0Y7dRI1sW1t9x1ySNGfZHL03/HeGLsbzB2kY48WAmR6Tle9BFd gSCw/iPbXp4ls9c9o9ky495JEjue5vwk8nE8z519Rnkfg035sEgIatfWMp5tSCw9dCqk ujlw== MIME-Version: 1.0 Received: by 10.52.26.104 with SMTP id k8mr1476648vdg.79.1347557967446; Thu, 13 Sep 2012 10:39:27 -0700 (PDT) Received: by 10.58.196.209 with HTTP; Thu, 13 Sep 2012 10:39:27 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Sep 2012 10:39:27 -0700 Message-ID: Subject: Re: Hadoop failing jobs non zero exit status 7 From: Aaron Eng To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f31e2f6c0f904c998c941 X-Gm-Message-State: ALoCoQk6FTzJDi5aOYjSqjwA6qvVc5JlVX9klNHjz8bD+eKWE0TrIPZZzcyZNDblqJVcjuufZ/9h --20cf307f31e2f6c0f904c998c941 Content-Type: text/plain; charset=ISO-8859-1 Hi Robin, "Task process exit with nonzero status of 7." is being printed by the TaskTracker to indicate the child JVM spawned to run the task attempt in question exited unexpectedly. This also means the task was not killed administratively (either by TaskTracker or by you, the admin). So basically, the TaskTracker tried to launch a JVM and it exited. You didn't post all the details for the attempt from the TaskTracker log so it's hard to say the specifics of when/how this happened. And I'm not familiar with exit code 7 being returned by a JVM but this would have been generated by the JVM process itself, not any user code you tried to run in the attempt. It could be that the JVM has some internal issue, some bug of sorts, what java version are you using? Or it could be the JVM needs something from the environment that is not available/permissible in the context in which it is being executed. So for instance, you could have some limit in place in the execution environment of the tasktracker which is being hit. If nothing else, you can note down the way in which the JVM is being spawned and try to spawn it manually and if its immediately reproducible, knowing whether this comes up when you spawn it directly from the shell vs. being spawned via TaskTracker is a useful bit of info. If you can't identify the cause, feel free to post in answers.mapr.com or send an email to support@mapr.com for some more assistance. Best Regards, Aaron Eng On Thu, Sep 13, 2012 at 5:38 AM, Robin Verlangen wrote: > Hi there, > > Today we started deploying Mapr M3 into production. However we're having > problems completing jobs. During a typical job the job return this: > > 12/09/11 16:33:20 INFO mapred.JobClient: Task Id : attempt_201209111629_0002_r_000001_2, Status : FAILED on node cl004.flxviz.com > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267) > Caused by: java.io.IOException: Task process exit with nonzero status of 7. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) > 12/09/11 16:33:20 WARN mapred.JobClient: Error reading task output http://cl004.flxviz.com:50060/tasklog?plaintext=true&attemptid=attempt_201209111629_0002_r_000001_2&filter=stdout > 12/09/11 16:33:20 WARN mapred.JobClient: Error reading task output http://cl004.flxviz.com:50060/tasklog?plaintext=true&attemptid=attempt_201209111629_0002_r_000001_2&filter=stderr* > > When I get the logs of the tasktracker, I see things like: > > 2012-09-11 16:32:43,204 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201209111629_0002_r_000002_1: java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267) > Caused by: java.io.IOException: Task process exit with nonzero status of 7. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) on tasktracker tracker_cl004.flxviz.com:localhost/127.0.0.1:53126 > 2012-09-11 16:32:46,234 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201209111629_0002_r_000002_1' > 2012-09-11 16:32:46,512 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201209111629_0002_m_000011_2' to tip task_201209111629_0002_m_000011, for tracker 'tracker_cl003.flxviz.com:localhost/127.0.0.1:42339' > 2012-09-11 16:32:48,027 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201209111629_0002_m_000011_2: java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267) > Caused by: java.io.IOException: Task process exit with nonzero status of 7. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) on tasktracker tracker_cl003.flxviz.com:localhost/127.0.0.1:42339 > 2012-09-11 16:32:51,055 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP) 'attempt_201209111629_0002_r_000002_2' to tip task_201209111629_0002_r_000002, for tracker 'tracker_cl003.flxviz.com:localhost/127.0.0.1:42339' > 2012-09-11 16:32:51,056 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201209111629_0002_m_000011_2' > 2012-09-11 16:32:51,359 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201209111629_0002_r_000002_2: java.lang.Throwable: Child Error* > > Does anyone have a clue where to start? It doesn't seem to be a MapR > specific problem, that's why I post this in the hadoop mailinglist. > > Some additional information: > OS: Centos 6.3 x64 > 16GB Ram > 2x quad core processor > 12x 1TB harddrive > Best regards, > > Robin Verlangen > *Software engineer* > * > * > W http://www.robinverlangen.nl > E robin@us2.nl > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > --20cf307f31e2f6c0f904c998c941 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Robin,

"Task process exit with nonzero status of= 7." is being printed by the TaskTracker to indicate the child JVM spa= wned to run the task attempt in question exited unexpectedly. This also mea= ns the task was not killed administratively (either by TaskTracker or by yo= u, the admin). =A0So basically, the TaskTracker tried to launch a JVM and i= t exited. =A0

You didn't post all the details for the attempt fro= m the TaskTracker log so it's hard to say the specifics of when/how thi= s happened. =A0And I'm not familiar with exit code 7 being returned by = a JVM but this would have been generated by the JVM process itself, not any= user code you tried to run in the attempt. =A0It could be that the JVM has= some internal issue, some bug of sorts, what java version are you using? = =A0Or it could be the JVM needs something from the environment that is not = available/permissible in the context in which it is being executed. =A0So f= or instance, you could have some limit in place in the execution environmen= t of the tasktracker which is being hit. =A0

If nothing else, you can note down the way in which the= JVM is being spawned and try to spawn it manually and if its immediately r= eproducible, knowing whether this comes up when you spawn it directly from = the shell vs. being spawned via TaskTracker is a useful bit of info.

If you can't identify the cause, feel free to post = in answers.mapr.com or send an emai= l to support@mapr.com for some more= assistance.

Best Regards,
Aaron Eng

On Thu, Sep 13, 2012 at 5:38 AM, Robin Verlangen <robin@us2= .nl> wrote:

Hi there,

Today we started deploying Mapr M3 into production. However we're havin= g problems completing jobs. During a typical job the job return this:

12/09/11 16:33:20 INFO mapred.= JobClient: Task Id : attempt_201209111629_0002_r_000001_2, Status : FAILED = on node cl004.flxviz.com java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267) Caused by: java.io.IOException: Task process exit with nonzero status of 7. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) 12/09/11 16:33:20 WARN mapred.JobClient: Error reading task output http://cl004.flxviz.co= m:50060/tasklog?plaintext=3Dtrue&attemptid=3Dattempt_201209111629_0002_= r_000001_2&filter=3Dstdout 12/09/11 16:33:20 WARN mapred.JobClient: Error reading task output http://cl004.flxviz.c= om:50060/tasklog?plaintext=3Dtrue&attemptid=3Dattempt_201209111629_0002= _r_000001_2&filter=3Dstderr*

Wh= en I get the logs of the tasktracker, I see things like:

2012-09-11 16:32:43,204 INFO o=
rg.apache.hadoop.mapred.TaskInProgress: Error from attempt_201209111629_000=
2_r_000002_1: java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267)
Caused by: java.io.IOException: Task process exit with nonzero status of 7.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) on =
tasktracker tracker_cl004.flxviz.co=
m:localhost/127.0.=
0.1:53126
2012-09-11 16:32:46,234 INFO org.apache.hadoop.mapred.JobTracker: Removing =
task 'attempt_201209111629_0002_r_000002_1'
2012-09-11 16:32:46,512 INFO org.apache.hadoop.mapred.JobTracker: Adding ta=
sk (JOB_SETUP) 'attempt_201209111629_0002_m_000011_2' to tip task_2=
01209111629_0002_m_000011, for tracker 'tracker_cl003.flxviz.com:localh=
ost/127.0.0.1:42339'
2012-09-11 16:32:48,027 INFO org.apache.hadoop.mapred.TaskInProgress: Error=
 from attempt_201209111629_0002_m_000011_2: java.lang.Throwable: Child Erro=
r
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:267)
Caused by: java.io.IOException: Task process exit with nonzero status of 7.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:254) on =
tasktracker tracker_cl003.flxviz.co=
m:localhost/127.0.=
0.1:42339
2012-09-11 16:32:51,055 INFO org.apache.hadoop.mapred.JobTracker: Adding ta=
sk (JOB_SETUP) 'attempt_201209111629_0002_r_000002_2' to tip task_2=
01209111629_0002_r_000002, for tracker 'tracker_cl003.flxviz.com:localh=
ost/127.0.0.1:42339'
2012-09-11 16:32:51,056 INFO org.apache.hadoop.mapred.JobTracker: Removing =
task 'attempt_201209111629_0002_m_000011_2'
2012-09-11 16:32:51,359 INFO org.apache.hadoop.mapred.TaskInProgress: Error=
 from attempt_201209111629_0002_r_000002_2: java.lang.Throwable: Child Erro=
r*

Do= es anyone have a clue where to start? It doesn't seem to be a MapR spec= ific problem, that's why I post this in the hadoop mailinglist.

Some additional= information:
OS: Centos 6.3 x64
16GB Ram
2x quad core processor
12x 1TB harddrive

Best regards,=A0=

Robin Verlangen
Software engineer<= /div>

E robin@us2.nl

Disclaimer: The information= contained in this message and attachments is intended solely for the atten= tion and use of the named addressee and may be confidential. If you are not= the intended recipient, you are reminded that the information remains the = property of the sender. You must not use, disclose, distribute, copy, print= or rely on this e-mail. If you have received this message in error, please= contact the sender immediately and irrevocably delete this message and any= copies.


--20cf307f31e2f6c0f904c998c941--