hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: streaming job in python that reports progress
Date Mon, 31 Jan 2011 17:07:11 GMT
Hi Felix,

Two options I can think of

1) Set longer timeouts   -Dmapred.task.timeout=_____  in millisecond.
or
2) Have a separate thread that reports back to TaskTracker with status through writing to
stderr
     https://issues.apache.org/jira/browse/HADOOP-1328
     Format:   "reporter:status:____"

Hope it works.

Koji


On 1/28/11 3:51 PM, "felix gao" <gre1600@gmail.com> wrote:

mighty user group,

I am trying to write a streaming job that does a lot of io in a python program.  I know if
I don't report back every x minutes the job will be terminated.  How do I report back to the
task tracker in my streaming python job that is in the middle of the gzip for example.

Thanks,

Felix


Mime
View raw message