hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Костарев А.Ф." <...@ics.perm.ru>
Subject Re: Algorithm of distribution Map and Reduce tasks at various topology of a network
Date Fri, 12 Jul 2013 03:42:14 GMT
On 07/09/2013 05:36 PM, "Костарев А.Ф." wrote:
Hi Junping!

We have launched MapReduce tasks in YARN cluster. Its topology is 
described in file topology.data. All files are here: 
And we found difference between MRv1 and YARN. In usual MapReduce tasks 
were executed everywhere, but in YARN they are executed only within one 
datacenter. It's shown on screenshots. Input file had replication factor 5.

So, is it possible to run 1 job not only in 1 datacenter but on all 
servers of cluster at the same time?

One more question. In MapReduce in ouptut directory in HDFS there were: 
_SUCCESS, log and part-r-xxxx. Now i can't find log. Is it in another 
place or in YARN there isn't such a file?

> Thank you for your prompt response
> We will try to repeat the test on Thursday and show more details
> On 07/09/2013 05:18 PM, Jun Ping Du wrote:
>> Hi Костарев,
>>    I think it should work for YARN even YARN doesn't support layer 
>> above rack (actually I am working on supporting more layers topology 
>> for YARN at YARN-18) now.
>>    Current YARN should just recognize your topology as three racks: 
>> "dc1/rack1", "dc2/rack1", "dc2/rack2". Each node (NM) with free 
>> resources should be assigned with containers in heartbeat with RM no 
>> matter what locality level there. The only exception case should be: 
>> 1. no pending resource requests 2. NM capacity is too small to meet 
>> resource request 3. delay scheduling is enabled and no data-local 
>> attempt. In your case, I don't see anything stop task assignment on 
>> a1 and a2. Anyone here can correct me if any misunderstanding here. :)
>>    Anyway, I will give it a try (as your configuration) later to see 
>> if some bugs in boundary cases there or it could be some 
>> misconfiguration. Which minor version (2.0.x or trunk) you are using 
>> now?
>> Thanks,
>> Junping
>> ----- Original Message -----
>> From: "Костарев А.Ф." <kaf@ics.perm.ru>
>> To: yarn-dev@hadoop.apache.org
>> Sent: Tuesday, July 9, 2013 5:48:49 PM
>> Subject: Algorithm of distribution Map and Reduce tasks at various 
>> topology of a network
>> Hi
>> I have claster in two datacenters
>>             CLUSTER
>>                |
>>       +--------+---------+
>>       |                  |
>> datacenter1        datacenter2
>>       |                  |
>>     rack1               rack1
>>         |                |  |
>>         +-a1             |  +-b1
>>         |                |  |
>>         +-a2             |  +-b3
>>                          |
>>                         rack2
>>                             +-b3
>> Cluster have file with repcica coefficient=5
>> All files's blocks resides on all servers of cluser.
>> When I work with standart MapReduce (MRv1) (called on b1) Map and
>> Rediuce task runs on all servers b1, b2, b3, a1, a2
>> When I work with YARN (MRv2) (called on b1) Map and Reduce task runs
>> only on b1, b2, b3
>> Can I run in YARN Map tasks on all servers?

Консультант 1-й категории
Костарев А.Ф.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message