hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johannes Zillmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1859) maxConcurrentMapTask & maxConcurrentReduceTask per job
Date Sat, 28 Aug 2010 21:36:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903891#action_12903891

Johannes Zillmann commented on MAPREDUCE-1859:

The Capacity scheduler solution does not seem to be flexible engough for cases where you have
different kind of input source and different configurations of input source and all these
kinds and configurations are not known at cluster startup. 
If you have a system where a user can setup an import from a database the limits they might
want to put on that import can be very different cause one imports something from a mysql-db,
one from oracle, one from a clustered db, one from a db wich is in other use as well, etc....

> maxConcurrentMapTask & maxConcurrentReduceTask per job
> ------------------------------------------------------
>                 Key: MAPREDUCE-1859
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1859
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: job submission
>    Affects Versions: 0.20.2
>            Reporter: Johannes Zillmann
> It would be valuable if one could specify the max number of map/reduce slots which should
be used for a given job. An example would be an map-reduce job importing from a database where
you don't want 50 map tasks querying one db at a time but also you don't want to shrink the
overall map task count.
> Also this is probably already possible through Fair/Capacity-Scheduler or an own Extension
i think it would be a good addition for the default TaskScheduler since this seems to be more
then a rare used feature.
> This would have the benefit in situations where you don't have control/ownership over
the cluster as well. 
> And its more job-centric whereas the existing scheduler extensions seems to be more job-type-centric.
> Implementing this feature should be relatively straightforward. Adding something like
jobConf.setMaxConcurrentMapTask(int) and respecting this configuration in JobQueueTaskScheduler.
> Not sure if this feature would be harmonical with the existing Fair/Capacity-Schedulers.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message