hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6551) Dynamic adjust mapTaskAttempt memory size
Date Wed, 18 Nov 2015 13:19:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010955#comment-15010955

Lin Yiqun commented on MAPREDUCE-6551:

I add the 3 new config info
* MRJobConfig.MAP_MEMORY_MB_AUTOSET_ENABLED:wherther enable this function
* MRJobConfig.MAP_UNIT_INPUT_LENGTH:the standard unit deal data length.
And if auto-set function is enabled, in {{MapTaskAttemptImpl#autoSetMemorySize}}  method will
adjust memory size by its {{splitInfo}} dataLength.If dataLength is large than UNIT_INPUT_LENGTH,the
size will be larger, other will be smaller.

> Dynamic adjust mapTaskAttempt memory size
> -----------------------------------------
>                 Key: MAPREDUCE-6551
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6551
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
> I found a scenario that the map tasks cost so much resource of cluster.This scenario
will be happened that if there are many small file blokcs (even some are not reach 1M),and
this will lead to many map task to read.And in gengeral,a map task attempt will use the default
config {{MRJobConfig#MAP_MEMORY_MB}} to set its resourceCapcity's memory to deal with their
datas.And this will cause a problem that map tasks cost so much memory resource and target
data is small.So I have a idea that wherther we can dynamic set mapTaskAttempt memory size
by its inputDataLength.And this value can be provided by {{TaskSplitMetaInfo#getInputDataLength}}
methods.Besides that,we should provided a standard unit dataLength for a standard memory size.

This message was sent by Atlassian JIRA

View raw message