hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patai Sangbutsarakum <silvianhad...@gmail.com>
Subject Re: where reduce is copying?
Date Thu, 28 Feb 2013 07:23:39 GMT
Thanks Harsh, you always the first..

Yeah, that's really make sense, copy inbound those output of mappers
to running reduce task attempt.

I am trying to think that the speed of 0.44MB/s is pretty low to me.
i am deciding if it is because of data is not that much to copy as
because not all the mappers are finished at the same time.
or it is the problem of the network itself. (i already check that bond0 is 1gb)

Thanks
Patai


On Wed, Feb 27, 2013 at 11:06 PM, Harsh J <harsh@cloudera.com> wrote:
> The latter (from other machines, inbound to where the reduce is
> running, onto the reduce's local disk, via mapred.local.dir). The
> reduce will, obviously, copy outputs from all maps that may have
> produced data for its assigned partition ID.
>
> On Thu, Feb 28, 2013 at 12:27 PM, Patai Sangbutsarakum
> <silvianhadoop@gmail.com> wrote:
>> Good evening Hadoopers!
>>
>> at the jobtracker page, click on a job, and click at running reduce
>> task, I am going to see
>>
>> task_201302271736_0638_r_000000 reduce > copy (136 of 261 at 0.44 MB/s)
>>
>> I am really curious where is the data is being copy.
>> if i clicked at the task, it will show a host that is running the task attempt.
>>
>> question is "reduce > copy" is referring data copy outbound from host
>> that is running task attempt, or
>> referring to data is being copy from other machines inbound to this
>> host (that's running task attempt)
>>
>> and in both cases how do i know what machines that host is copy data from/to?
>>
>> Regards,
>> Patai
>
>
>
> --
> Harsh J

Mime
View raw message