##### Site index · List index
Message view
Top
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: Barrier between reduce and map of the next round
Date Tue, 09 Feb 2010 05:41:18 GMT
```Hi,
>>m1 | r1 m2 | r2 m3 | ... | r(K-1) mK | rK m(K+1)
My understanding is it would be something like:
m1|(r1 m2)| m(identity) | r2, if you combine the r(i) and m(i+1), because of the hard distinction
between Rs & Ms.

Amogh

On 2/4/10 1:46 PM, "Felix Halim" <felix.halim@gmail.com> wrote:

Talking about barrier, currently there are barriers between anything:

m1 | r1 | m2 | r2 | ... | mK | rK

where | is the barrier.

I'm saying that the barrier between ri and m(i+1) is not necessary.
So it should go like this:

m1 | r1 m2 | r2 m3 | ... | r(K-1) mK | rK m(K+1)

Here the result of m(K+1) is throwed away.
We take the result of rK only.

The shuffling is needed only between mi and ri.
There is no shuffling needed for ri and m(i+1).

Thus by removing the barrier between ri and m(i+1), the overall job

Now the question is, can this be done using Chaining?
AFAIK, the chaining has to be defined before the job is started, right?
But because I don't know the value of K beforehand,
I want the chain to continue forever until some counter in reduce task is zero.

Felix Halim

On Thu, Feb 4, 2010 at 3:53 PM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
>
>>>However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should
>>> not need to wait for all reducers ri to finish, right?
>
> Yes, but r(i+1) cant be in the same job, since that requires another sort
> and shuffle phase ( barrier ). So you would end up doing, job(i) :
> m(i)r(i)m(i+1) . Job(i+1) : m(identity)r(i+1). Ofcourse, this is assuming
> you cant do r(i+1) in m(identity), for if you can then it doesn't need sort
> and shuffle , and hence your job would be again of the form m+rm* :)
>
> Amogh
>
> On 2/4/10 10:19 AM, "Felix Halim" <felix.halim@gmail.com> wrote:
>
> Hi Ed,
>
> Currently my program is like this:  m1,r1, m2,r2, ..., mK, rK. The
> barrier between mi and ri is acceptable since reducer has to wait for
> all map task to finish. However, from ri to m(i+1) there is an
> unnecessary barrier. m(i+1) should not need to wait for all reducers
> ri to finish, right?
>
> Currently, I created one Job for each mi,ri. So I have total of K
> jobs. Is there a way to chain them all together into a single Job?
> However, I don't know the value of K in advance. It has to be checked
> after each ri.  So I'm thinking that the job can speculatively do the
> chain over and over until it discover that some counter in ri is zero
> (so the result of m(K+1) is discarded, and the final result of rK is
> taken).
>
> Felix Halim
>
>
> On Thu, Feb 4, 2010 at 12:25 PM, Ed Mazur <mazur@cs.umass.edu> wrote:
>> Felix,
>>
>> You can use ChainMapper and ChainReducer to create jobs of the form
>> M+RM*. Is that what you're looking for? I'm not aware of anything that
>> allows you to have multiple reduce functions without the job
>> "barrier".
>>
>> Ed
>>
>> On Wed, Feb 3, 2010 at 9:41 PM, Felix Halim <felix.halim@gmail.com> wrote:
>>> Hi all,
>>>
>>> As far as I know, a barrier exists between map and reduce function in
>>> one round of MR. There is another barrier for the reducer to end the
>>> job for that round. However if we want to run in several rounds using
>>> the same map and reduce functions, then the barrier between reduce and
>>> the map of the next round is NOT necessary, right? Since the reducer
>>> only output a single value for each key. This reducer may as well run
>>> a map task for the next round immediately rather than waiting for all
>>> reducer to finish. This way, the utilization of the machines between
>>> rounds can be improved.
>>>
>>> Is there a setting in Hadoop to do that?
>>>
>>> Felix Halim
>>>
>>
>
>

```
Mime
View raw message