hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nataraj jonnalagadda <nataraj.jonnalaga...@gmail.com>
Subject Re: tasks stuck in UNASSIGNED state
Date Tue, 16 Jun 2015 06:24:19 GMT
Hey Matt,

Its possibly due to your YARN config... Possibly, YARN/Mapred ACLs / YARN
scheduler config or Cgroups not (incase enabled) set up not correctly. We
could dig in more if we have the yarn-site.xml and scheduler conf files.


Thanks,
Nat.



On Mon, Jun 15, 2015 at 10:39 PM, Matt K <matvey1414@gmail.com> wrote:

> I see there's 2 threads - one that kicks off the mappers, and another that
> kicks off reducers. The one that kicks off the mappers got stuck. It's not
> yet clear to me where it got stuck exactly.
>
> On Tue, Jun 16, 2015 at 1:11 AM, Matt K <matvey1414@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm dealing with a production issue, any help would be appreciated. I am
>> seeing very strange behavior in the TaskTrackers. After they pick up the
>> task, it never comes out of the UNASSIGNED state, and the task just gets
>> killed 10 minutes later.
>>
>> 2015-06-16 02:42:21,114 INFO org.apache.hadoop.mapred.TaskTracker:
>> LaunchTaskAction (registerTask): attempt_201506152116_0046_m_000286_0
>> task's state:UNASSIGNED
>> 2015-06-16 02:52:21,805 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201506152116_0046_m_000286_0: Task
>> attempt_201506152116_0046_m_000286_0 failed to report status for 600
>> seconds. Killing!
>>
>> Normally, I would see the following in the logs:
>>
>> 2015-06-16 04:30:32,328 INFO org.apache.hadoop.mapred.TaskTracker: Trying
>> to launch : attempt_201506152116_0062_r_000004_0 which needs 1 slots
>>
>> However, it doesn't get this far for these particular tasks. I am
>> perusing the source code here, and this doesn't seem to be possible:
>>
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapred/TaskTracker.java#TaskTracker.TaskLauncher.0tasksToLaunch
>>
>> The code does something like this:
>>
>>     public void addToTaskQueue(LaunchTaskAction action) {
>>       synchronized (tasksToLaunch) {
>>         TaskInProgress tip = registerTask(action, this);
>>         tasksToLaunch.add(tip);
>>         tasksToLaunch.notifyAll();
>>       }
>>     }
>>
>> The following should pick it up:
>>
>>     public void run() {
>>       while (!Thread.interrupted()) {
>>         try {
>>           TaskInProgress tip;
>>           Task task;
>>           synchronized (tasksToLaunch) {
>>             while (tasksToLaunch.isEmpty()) {
>>               tasksToLaunch.wait();
>>             }
>>             //get the TIP
>>             tip = tasksToLaunch.remove(0);
>>             task = tip.getTask();
>>             LOG.info("Trying to launch : " + tip.getTask().getTaskID() +
>>                      " which needs " + task.getNumSlotsRequired() + " slots");
>>           }
>>
>> What's even stranger is that this is happening for Map tasks only. Reduce tasks are
fine.
>>
>> This is only happening on a handful of the nodes, but enough to either slow down
jobs or cause them to fail.
>>
>> We're running Hadoop 2.3.0-cdh5.0.2
>>
>> Thanks,
>>
>> -Matt
>>
>>
>
>
> --
> www.calcmachine.com - easy online calculator.
>

Mime
View raw message