hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1462) Better progress reporting from a Task
Date Fri, 08 Jun 2007 11:25:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Vivek Ratan updated HADOOP-1462:

    Status: Patch Available  (was: Open)

This patch takes care of the problems listed in this issue: 
- RPC calls do not block any more
- Minimal RPC calls are made
- Progress is reported only when it is actually made

In addition, the patch handles some issues reported in HADOOP-1431: 
- we make sure we send keep-alive requests and report progress to TaskTracker when in the
middle of potentially long-running jobs, which may involve user code. Specifically, progress
is reported during the Map sort/merge phase and during the Reduce shuffle/merge phases ( I
may have the terminology wrong). 

> Better progress reporting from a Task
> -------------------------------------
>                 Key: HADOOP-1462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1462
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
> The Task code that reports progress updates has the following problems:
> 1. Some RPC calls are blocking. For example, in MapRunner::run(), the call to RecordReader::next()
can result in a blocking RPC call to the Task Tracker (TT) to report progress. 
> 2. Some RPC calls are unnecessary. The Ping thread pings the TT once every second, while
we also independently send progress updates every second. We don't, for example, need to ping
the TT right after we send the progress update. 
> 3. In some places, we spawn a thread to send progress updates (in MapOutputBuffer::collect(),
for example). If our code gets stuck, the thread will continue sending updates to the TT and
we will never be shut down. 
> These issues, in some form or another, have been reported in HADOOP-1201 and HADOOP-1431.

> I propose we make the following changes: 
> 1. In order to make the RPC calls non-blocking, we need a thread that calls TT. This
thread, to be created early on, will make sure we make the most appropriate RPCs. It will
have access to two flags: a progress flag that indicates that the Task has made progress since
the last RPC, and a keep_alive flag that indicates that we need to let the TT know that we're
alive. This thread will also handle pings. It's logic will be something like this: 
> while (1) {
> 	if (progress_flag is set) {
> 		// report progress update
> 		umbilical::progress(...);
> 		if (failure), kill task;
> 		reset progress_flag; 
> 		reset keep_alive_flag; // calling progress() also indicates that we're alive
> 	}
> 	else if (keep_alive_flag is set) {
> 		// let TT know we're alive
> 		umbilical::progress(same params as last time);
> 		if (failure), kill task;
> 		reset keep_alive_flag;
> 		break;
> 	}
> 	else {
> 		// see if TT is alive
> 		umbilical::ping();
> 		if (failure), kill task;
> 	}
> 	sleep (1 sec);
> }
> 2. progress_flag and keep_alive_flag are set by the MapReduce code. Reporter::progress()
(in Task.java) sets keep_alive_flag while progress_flag is set whenever Task's taskProgress
object has any of its fields changed. 
> 3. We do away with Task::reportProgress() as this code is now handled in the Progress
thread. Wherever this method is called in our MapReduce kernel code, we should replace it
either with Reporter::progress() (if the intent was to let TT know that we're alive) or we
simply remove that call (if the intent was to transmit progress changes to the TT). 
> 4. TaskUmbilicalProtocol::progress() should return a boolean, and should return the same
values that TaskUmbilicalProtocol::ping() does. This will let the Task know whether its ID
is known to the TT. 
> 5. We no longer need to create a ping thread in TaskTracker::Child. However, we can perhaps
create the Progress thread in the same place the Ping thread was created. 
> 6. We will need to remove code that creates progress threads. This is in MapTask::MapOutputBuffer::collect(),
MapTask::MapOutputBuffer::flush(), and ReduceTask::ReduceCopier(), at the least. Instead,
we will need to add code that updates the progress or calls Reporter::progress(). Any of these
calls simply update flags. so there's not a lot of performance penalty (at worst, updating
progress_flag or keep_alive_flag may need to be done within a synchronized block, but even
that may not be necessary since the flags are just boolean values). As per HADOOP-1431, these
calls can be made through a ReportingComparator, or from within the generic BuferSorter, or
perhaps from some place better. 
> I may have missed out on some details, but hopefully the overall idea is clear. Comments

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message