hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: RecordReader Progress Reporting.
Date Sat, 09 Apr 2011 09:10:35 GMT
Hello Jane,

On Tue, Mar 29, 2011 at 4:40 AM, Jane Chen <jxchen_us_1999@yahoo.com> wrote:
> There are times when I don't have an accurate count of the total records to be processed,
and I wonder the impact on task scheduling when returning an inaccurate progress percentage.
 I found that when I return either 0 when not done and 1 when done will make the job hang.

What do you mean when you say the job 'hangs' when you statically set
it to 0 or 1 always? Do you mean the task gets killed and restarted?

When progress or status message changes are made, a Task status report
is sent back via the reporter to the TIP object held by the parent
TaskTracker. In case a TIP has not received task reports in a while,
it can go ahead and purge the task claiming that it has hung or gone
unresponsive (mapred.task.timeout, 600s by default - set to 0 to never
let it purge) and it gets rescheduled.

If you're not sure what your progress is while processing stuff in RR,
set progress to a random value; it shouldn't matter to the framework
if the progress decreases in value.

Harsh J

View raw message