hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhong Wang <wangzhong....@gmail.com>
Subject Re: What is the best way to use the Hadoop output data
Date Fri, 26 Jun 2009 08:10:05 GMT
Hi Huy,

On Thu, Jun 25, 2009 at 6:02 PM, Huy Phan<dachuy@gmail.com> wrote:
> I'm wondering if there's any performance killer in this approach, I posted
> the question to IRC channel and someone told me that there may be a
> bottleneck.

There may be some communication errors to block your MapReduce job
when you post your output data. So I think it's better to do this
after the job is done.

> I wonder if there is any way to spawn a process directly from Hadoop after
> all the MapReduce tasks finish ?

How do you submit your jobs? You can block the job submit process by
running job using job.waitForCompletion(true) in your main driver
class. Then the two processes are synchronous.

Zhong Wang

View raw message