hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1408) Allow customization of job submission policies
Date Thu, 11 Feb 2010 03:22:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832373#action_12832373
] 

Chris Douglas commented on MAPREDUCE-1408:
------------------------------------------

The patch looks good. A few notes after a first pass (I have not examined the statistics code,
yet):
* {{TestGridmixSubmission}} is an expensive test to run; the cluster startup/shutdown can
be amortized across runs (as can the data generation). If this needs to clean up between jobs,
consider using {{@Before}} and {{@After}} instead of making separate tests (i.e., no subclassing
the unit test). The current patch also makes many fields protected without referencing them
in subclasses and also makes static, inner classes into member classes for reasons that aren't
obvious (e.g. {{FilterJobStory}})
* Very minor nits: the coding standards recommend whitespace after commas and before and after
braces. Control statements also must enclose their blocks in braces e.g. "{{if (expr) { stmt
}}}", never "{{if (expr) stmt}}", "{{foo(a, b, c) {}}" never "{{foo(a, b,c){}}". Some lines
in the patch exceed 80 characters
* {{DebugGridmix::createJobMonitor}} should allow IOExceptions to escape and fail instead
of logging the exception
* {{statistics}} is a more descriptive name than {{collector}} for the {{StatsCollector}}
references
* It would probably be cleaner to pull the {{REPLAY}} mode into a separate class, allowing
{{JobFactory}} to be an abstract base.
* The {{JobMonitor::process}} method can probably be removed, since it doesn't do any meaningful
work (the instance is instead passed to the downstream {{StatsCollector}}, which does). The
monitor is then only responsible for determining when the job is done, not its outcome The
unit test can subclass the downstream component instead of subclassing {{JobMonitor}}
* Instead of making {{JobFactory::rThread}} protected, adding an abstract {{createReaderThead}}
would also allow it to remain final
* The {{SerialJobFactory}} includes logic that preserves submission times, but it should be
submitting the job so that it executes immediately. Since the queue between the {{JobFactory}}
and {{JobSubmitter}} will never have more than 1 element in it, it should be sufficient to
submit new jobs at the system time.
* The {{SerialJobFactory}} should either be waiting on a condition or add new jobs to the
submitter while holding the lock. Otherwise it's possible (though extremely unlikely) for
signals to be missed.
* If a job fails submission in serial mode, wil the {{JobFactory}} be woken up?
* Instead of adding {{GridmixJobSubmissionPolicy::getPolicy}}, callers can use {{Configuration::getEnum}}
(so callers may specify the default)
* The poll interval in the submission policy enum should be specified as a final param on
the instance (as the name is), not by an abstract method. The {{getName}} method is the same
as its {{toString}}, isn't it? Is the "name" important?

> Allow customization of job submission policies
> ----------------------------------------------
>
>                 Key: MAPREDUCE-1408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/gridmix
>            Reporter: rahul k singh
>         Attachments: 1408-1.patch
>
>
> Currently, gridmix3 replay job submission faithfully. For evaluation purposes, it would
be great if we can support other job submission policies such as sequential job submission,
or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message