hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Lidral-Porter <li...@aperiodic.org>
Subject Task process JVM exits with status code 1
Date Tue, 01 Nov 2011 20:34:38 GMT
Hi Everybody,

I'm having an issue with CDH3u0 where some of my reduce tasks are failing due to a Child Error
caused by the task JVM exiting with a status of 1. From hunting around in the mailing list
archives, it seems that this usually happens for one of two reasons:

1. The userlog directory has too many subdirectories, so the task fails when the necessary
logs can't be created. This isn't the case here, since there are only a few dozen subdirectories.

2. The mapred.child.ulimit configuration parameter is lower than the max heap size set by
mapred.child.java.opts. Again, I don't think that this is the cause, since I've set mapred.child.ulimit
to be about 2GB (the exact value is 2,097,000), while the heap size set in mapred.child.java.opts
is 1024 MB.

There's nothing in the stdout or stderr logs for the failed tasks, and the syslog seems normal.
There doesn't seem to be anything out of the ordinary in the TT log pertaining to the tasks,
until the tasks' JVM failure. For reference, a task's syslog and an excerpt of the TT log
while the task was running are available here: https://gist.github.com/1331700. The TT's mapred-site.xml
(slightly redacted) is available here: https://gist.github.com/1331740.

I don't think the issue has anything to do with the code itself, since the code is well-tested,
and runs fine most of the time. However, I do think that it's something to do with memory,
since the jobs whose tasks fail are the ones that process a lot of data. Could the task JVM
exit with status 1 for any reason other than the two I listed above (particularly a memory-related
reason)? Or am I goofing something else up?

Cheers,
Dan Lidral-Porter
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message