hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt K <matvey1...@gmail.com>
Subject Re: tasks stuck in UNASSIGNED state
Date Tue, 16 Jun 2015 05:39:28 GMT
I see there's 2 threads - one that kicks off the mappers, and another that
kicks off reducers. The one that kicks off the mappers got stuck. It's not
yet clear to me where it got stuck exactly.

On Tue, Jun 16, 2015 at 1:11 AM, Matt K <matvey1414@gmail.com> wrote:

> Hi all,
>
> I'm dealing with a production issue, any help would be appreciated. I am
> seeing very strange behavior in the TaskTrackers. After they pick up the
> task, it never comes out of the UNASSIGNED state, and the task just gets
> killed 10 minutes later.
>
> 2015-06-16 02:42:21,114 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201506152116_0046_m_000286_0
> task's state:UNASSIGNED
> 2015-06-16 02:52:21,805 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201506152116_0046_m_000286_0: Task
> attempt_201506152116_0046_m_000286_0 failed to report status for 600
> seconds. Killing!
>
> Normally, I would see the following in the logs:
>
> 2015-06-16 04:30:32,328 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201506152116_0062_r_000004_0 which needs 1 slots
>
> However, it doesn't get this far for these particular tasks. I am perusing
> the source code here, and this doesn't seem to be possible:
>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapred/TaskTracker.java#TaskTracker.TaskLauncher.0tasksToLaunch
>
> The code does something like this:
>
>     public void addToTaskQueue(LaunchTaskAction action) {
>       synchronized (tasksToLaunch) {
>         TaskInProgress tip = registerTask(action, this);
>         tasksToLaunch.add(tip);
>         tasksToLaunch.notifyAll();
>       }
>     }
>
> The following should pick it up:
>
>     public void run() {
>       while (!Thread.interrupted()) {
>         try {
>           TaskInProgress tip;
>           Task task;
>           synchronized (tasksToLaunch) {
>             while (tasksToLaunch.isEmpty()) {
>               tasksToLaunch.wait();
>             }
>             //get the TIP
>             tip = tasksToLaunch.remove(0);
>             task = tip.getTask();
>             LOG.info("Trying to launch : " + tip.getTask().getTaskID() +
>                      " which needs " + task.getNumSlotsRequired() + " slots");
>           }
>
> What's even stranger is that this is happening for Map tasks only. Reduce tasks are fine.
>
> This is only happening on a handful of the nodes, but enough to either slow down jobs
or cause them to fail.
>
> We're running Hadoop 2.3.0-cdh5.0.2
>
> Thanks,
>
> -Matt
>
>


-- 
www.calcmachine.com - easy online calculator.

Mime
View raw message