hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
Date Fri, 26 Jul 2013 17:39:49 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-5423:
----------------------------------

          Component/s: mrv2
    Affects Version/s: 2.0.2-alpha
    
> Rare deadlock situation when reducers try to fetch map output
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5423
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock situation when
reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem
is below (all fetchers went into a wait state after they can't acquire more RAM beyond the
memoryLimit and no fetcher is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472,
ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0 Thread started: EventFetcher for fetching Map Completion
Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000017_0
sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000017_0 decomp: 27
len: 31 to MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 27 bytes from map-output for attempt_1373902166027_0622_m_000017_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 27, inMemoryMapOutputs.size() -&gt; 1,
commitMemory -&gt; 0, usedMemory -&gt;27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000016_0
sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000016_0 decomp: 55841282
len: 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_000016_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 55841282, inMemoryMapOutputs.size() -&gt;
2, commitMemory -&gt; 27, usedMemory -&gt;55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_000015_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 118022137, inMemoryMapOutputs.size() -&gt;
3, commitMemory -&gt; 55841309, usedMemory -&gt;173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
> 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:42,190 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000014_0
sent hash and receievd reply
> 2013-07-18 04:32:42,277 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000014_0 decomp: 140715962
len: 140715966 to MEMORY
> 2013-07-18 04:32:42,493 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 140715962 bytes from map-output for attempt_1373902166027_0622_m_000014_0
> 2013-07-18 04:32:42,493 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 140715962, inMemoryMapOutputs.size() -&gt;
4, commitMemory -&gt; 173863446, usedMemory -&gt;314579408
> 2013-07-18 04:32:42,494 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 306s
> 2013-07-18 04:32:43,192 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:43,192 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:43,192 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:43,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000013_0
sent hash and receievd reply
> 2013-07-18 04:32:43,280 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000013_0 decomp: 141243082
len: 141243086 to MEMORY
> 2013-07-18 04:32:43,506 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 141243082 bytes from map-output for attempt_1373902166027_0622_m_000013_0
> 2013-07-18 04:32:43,506 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 141243082, inMemoryMapOutputs.size() -&gt;
5, commitMemory -&gt; 314579408, usedMemory -&gt;455822490
> 2013-07-18 04:32:43,507 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 315s
> 2013-07-18 04:32:44,195 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:44,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:44,195 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:44,198 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000011_0
sent hash and receievd reply
> 2013-07-18 04:32:44,305 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_000011_0 decomp: 173528412
len: 173528416 to MEMORY
> ...
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 282474777 bytes from map-output for attempt_1373902166027_0622_m_000001_0
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 282474777, inMemoryMapOutputs.size() -&gt;
5, commitMemory -&gt; 1179552807, usedMemory -&gt;1462027584
> 2013-07-18 04:32:56,901 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#2 in 2682s
> 2013-07-18 04:32:56,901 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#4
> 2013-07-18 04:32:56,902 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 4 of 4 to 101-09-04.sc1.verticloud.com:8080 to fetcher#4
> 2013-07-18 04:32:56,904 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000006_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000003_0,attempt_1373902166027_0622_m_000005_0
sent hash and receievd reply
> 2013-07-18 04:32:57,336 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:57,414 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#4 about to shuffle output of map attempt_1373902166027_0622_m_000006_0 decomp: 280156692
len: 280156696 to MEMORY
> 2013-07-18 04:32:57,867 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 280156692 bytes from map-output for attempt_1373902166027_0622_m_000006_0
> 2013-07-18 04:32:57,867 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.MergeManager:
closeInMemoryFile -&gt; map-output of size: 280156692, inMemoryMapOutputs.size() -&gt;
6, commitMemory -&gt; 1462027584, usedMemory -&gt;1742184276
> 2013-07-18 04:32:57,900 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#4 - MergerManager returned Status.WAIT ...
> 2013-07-18 04:32:57,901 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
101-09-04.sc1.verticloud.com:8080 freed by fetcher#4 in 999s
> 2013-07-18 04:32:57,901 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
Assiging 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#3
> 2013-07-18 04:32:57,901 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
assigned 4 of 4 to 101-09-04.sc1.verticloud.com:8080 to fetcher#3
> 2013-07-18 04:32:57,903 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for url=8080/mapOutput?job=job_1373902166027_0622&amp;reduce=1&amp;map=attempt_1373902166027_0622_m_000000_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000005_0,attempt_1373902166027_0622_m_000003_0
sent hash and receievd reply
> 2013-07-18 04:32:57,904 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#3 - MergerManager returned Status.WAIT ...
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message