hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
Date Sat, 28 Nov 2009 04:06:22 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783203#action_12783203

Aaron Kimball commented on MAPREDUCE-1119:

To address your various points:

bq. src/c++/task-controller/main.c
The program attempts to deduce its configuration directory based on {{argv[0]}}. Given that
task-controller should be in {{$HADOOP_HOME/bin/}}, it will do some surgery on the program
name to remove "bin/task-controller" and add the "conf" suffix, giving {{$HADOOP_HOME/conf}}.
Unfortunately, this presumes that {{argv[0]}} is the absolute path to {{task-controller}}.
If it's not, then this will result in a negative offset being calculated -- and it will attempt
to malloc a negative amount of space, which causes a SIGSEGV. This checks that task-controller
was invoked with {{argv[0]}} set to the absolute path to task-controller, and not just something
like './task-controller'

Hadoop may invoke task-controller via its absolute path, but when I was unit testing it, I
just ran './task-controller', which was causing it to segfault. This should exit politely

bq. Now that the semantics have changed, I am not very sure we want to use the same configuration
property for sleeping after dump-stack. (Thinking aloud..) Do we even need a sleep here? The
signalling order is SIGQUIT->SIGTERM->SIGKILL. Will signals be processed in the order
of their arrival? If so, then we will not another sleep. If not, we may need a sleep here,
but may or may not be driven by the same config item. What do you think?

I'm not honestly sure. I believe that the operating system will raise the signals to the process
in order. However, what it means to "handle" a signal is ambiguous. It may be the case that
the signal handler for SIGQUIT merely sets a flag which tells another thread to print the
stack dumps to stdout at some other convenient interval (e.g., during a gc when the world
is stopped). So while SIGQUIT may be received by the JVM before SIGTERM, I don't take that
as a guarantee that the data will reach stdout when we want it to. So I think the delay is
necessary. As for adding a new config item.. maybe wait til someone makes an issue of this?
Is there any performance implication here?

bq. Are we sure "PSPermGen" will always be there in the dump? Instead how about passing our
own TaskController that does custom actions in TaskController.dumpStacks(), simplifying our
verification that dump-stack is indeed called?

In the Sun JVM, the stack dump messages always end with a description of the heap, which includes
this key string. I doubt this is standardized; JRockit or another JVM will probably use some
other string. Testing with a MockTaskController is probably a good idea here.

bq. The test-time can be more than halved if we set max-map-attempts to one in both the tests
via conf.setMaxMapAttempts(1);

Good observation. will do.

bq. We need a similar test for LinuxTaskController to test stack-dump when multiple users
are involved.

I'll look into this.

I'll also handle the other more minor suggestions you made as well.
Thanks for the review!

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, MAPREDUCE-1119.4.patch,
MAPREDUCE-1119.5.patch, MAPREDUCE-1119.patch
> When the TT kills tasks that haven't reported status, it should somehow gather a stack
dump for the task. This could be done either by sending a SIGQUIT (so the dump ends up in
stdout) or perhaps something like JDI to gather the stack directly from Java. This may be
somewhat tricky since the child may be running as another user (so the SIGQUIT would have
to go through LinuxTaskController). This feature would make debugging these kinds of failures
much easier, especially if we could somehow get it into the TaskDiagnostic message

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message