From "Saptarshi Guha" <saptarshi.g...@gmail.com>
Subject A reporter thread during the reduce stage for a long running line
Date Fri, 09 Jan 2009 08:09:50 GMT
  Sorry for the puzzling subject. I have a single long running
/statement/ in my reduce method, so the the framework might assume my
reduce is not responding and kill it.
I solved the problem in the map method by subclassing MapRunner, and
running a thread which calls reporter.progress() every minute or so.
However the same thread does not run during the reduce (i checked this
by setting a status string in the thread which  did not appear (on the
Jobtracker website) during the reduce stage but did appearing during
the map stage).

Hadoop v.0.20 appears to solve this by having separate run methods for
both Map and Reduce, however I'm using v 0.19.
I scanned the Streaming source and it only subclasses MapRunner, so I
assume it to has the same limitation (probably wrong, if so can
someone point me to the location?)

Is there a way around this, /without/ starting a thread in the reduce function?
Hadoop v 0.19

Many thanks

Saptarshi Guha - saptarshi.guha@gmail.com

