hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaoyunjiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5368) Save memory by set capacity, load factor and concurrency level for ConcurrentHashMap in TaskInProgress
Date Wed, 03 Jul 2013 00:43:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698454#comment-13698454
] 

zhaoyunjiong commented on MAPREDUCE-5368:
-----------------------------------------

Normally taskLocality & taskAvataar won't exceeds 4 items(mapred.map/reduce.max.attemps
is 4), most of them should be 1.
If use default initial capacity & concurrency level, the number of instance of  NonfairSync,
Segment and HashEntry will have 32 times of TaskInProgress, which will consume a lot of memory
as shown in above description.
And there will be very few concurrent access to taskLocality & taskAvataar, actually I
was thinking to reduce it to 1 or even replace ConcurrentHashMap.
                
> Save memory by  set capacity, load factor and concurrency level for ConcurrentHashMap
in TaskInProgress
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5368
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5368
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: zhaoyunjiong
>             Fix For: 1.2.1
>
>         Attachments: MAPREDUCE-5368.patch
>
>
> Below is histo from our JobTracker:
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     136048824    11347237456  [C
>    2:     124156992     5959535616  java.util.concurrent.locks.ReentrantLock$NonfairSync
>    3:     124156973     5959534704  java.util.concurrent.ConcurrentHashMap$Segment
>    4:     135887753     5435510120  java.lang.String
>    5:     124213692     3975044400  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
>    6:      63777311     3061310928  java.util.HashMap$Entry
>    7:      35038252     2803060160  java.util.TreeMap
>    8:      16921110     2712480072  [Ljava.util.HashMap$Entry;
>    9:       4803617     2420449192  [Ljava.lang.Object;
>   10:      50392816     2015712640  org.apache.hadoop.mapred.Counters$Counter
>   11:       7775438     1181866576  [Ljava.util.concurrent.ConcurrentHashMap$Segment;
>   12:       3882847     1118259936  org.apache.hadoop.mapred.TaskInProgress
> ConcurrentHashMap takes more than 14G(5959535616 + 5959534704 + 3975044400).
> The trouble maker are below codes in TaskInProgress.java:
>   Map<TaskAttemptID, Locality> taskLocality = 
>       new ConcurrentHashMap<TaskAttemptID, Locality>();
>   Map<TaskAttemptID, Avataar> taskAvataar = 
>       new ConcurrentHashMap<TaskAttemptID, Avataar>();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message