hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhong Wang <wangzhong....@gmail.com>
Subject Re: What is the best way to use the Hadoop output data
Date Fri, 26 Jun 2009 08:10:05 GMT
Hi Huy,

On Thu, Jun 25, 2009 at 6:02 PM, Huy Phan<dachuy@gmail.com> wrote:
> I'm wondering if there's any performance killer in this approach, I posted
> the question to IRC channel and someone told me that there may be a
> bottleneck.

There may be some communication errors to block your MapReduce job
when you post your output data. So I think it's better to do this
after the job is done.

> I wonder if there is any way to spawn a process directly from Hadoop after
> all the MapReduce tasks finish ?
>

How do you submit your jobs? You can block the job submit process by
running job using job.waitForCompletion(true) in your main driver
class. Then the two processes are synchronous.


-- 
Zhong Wang

Mime
View raw message