hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1819) RaidNode should be smarter in submitting Raid jobs
Date Tue, 28 Sep 2010 01:15:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915581#action_12915581

Scott Chen commented on MAPREDUCE-1819:

It seems to me that the intention in the following part in RaidNode.java is not very clear.

            int runningJobsCount = jobMonitor.runningJobsCount(info.getName());
            // Is there a scan in progress for this policy?
            if (scanState.containsKey(info.getName())) {
              // If there is a scan in progress for this policy, we can have
              // upto maxJobsPerPolicy running jobs.
              if (runningJobsCount >= maxJobsPerPolicy) {
            } else {
              // If there isn't a scan in progress for this policy, we don't
              // want to start a fresh scan if there is even one running job.
              if (runningJobsCount >= 1) {

Also the logic which checks the period of policy is inside selectFiles().
Maybe we can put it a method (something like shouldProcessPolicy()) along with the above logic.

> RaidNode should be smarter in submitting Raid jobs
> --------------------------------------------------
>                 Key: MAPREDUCE-1819
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1819
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/raid
>    Affects Versions: 0.20.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>         Attachments: MAPREDUCE-1819.patch, MAPREDUCE-1819.patch.2
> The RaidNode currently computes parity files as follows:
> 1. Using RaidNode.selectFiles() to figure out what files to raid for a policy
> 2. Using #1 repeatedly for each configured policy to accumulate a list of files. 
> 3. Submitting a mapreduce job with the list of files from #2 using DistRaid.doDistRaid()
> This task addresses the fact that #2 and #3 happen sequentially. The proposal is to submit
a separate mapreduce job for the list of files for each policy and use another thread to track
the progress of the submitted jobs. This will help reduce the time taken for files to be raided.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message