hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
Date Fri, 10 Sep 2010 20:09:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908171#action_12908171
] 

Greg Roelofs commented on MAPREDUCE-1434:
-----------------------------------------

I'm pretty sure "Submit Patch" is the preferred way to signal that a patch is ready for review;
that will also trigger Hudson to perform automatic checks (or should, anyway--Hudson's been
pretty flaky lately).

Before you do that, however, you should review the official Hadoop code conventions, which
are the same as the Sun/Oracle ones (http://www.oracle.com/technetwork/java/codeconv-138413.html)
except with half the indentation (4 -> 2, 8 -> 4, etc.).  I just skimmed over your patch
and noticed a strange mix of 2-space, 3-space, 4-space, and 8-space indentation...that makes
the code very hard to read.

> Dynamic add input for one job
> -----------------------------
>
>                 Key: MAPREDUCE-1434
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.20.3
>            Reporter: Xing Shi
>             Fix For: 0.20.3
>
>         Attachments: dynamic_input-v1.patch
>
>
> Always we should firstly upload the data to hdfs, then we can analize the data using
hadoop mapreduce.
> Sometimes, the upload process takes long time. So if we can add input during one job,
the time can be saved.
> WHAT?
> Client:
> a) hadoop job -add-input jobId inputFormat ...
> Add the input to jobid
> b) hadoop job -add-input done
> Tell the JobTracker, the input has been prepared over.
> c) hadoop job -add-input status jobid
> Show how many input the jobid has.
> HOWTO?
> Mainly, I think we should do three things:
> 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate
the split, and submit to JobTracker.
> 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks.
Because the uploaded data will be 
> processed quickly, so it also should update the scheduler to support pending a map task
till Client tells the Job input done.
> 3. Reducer: the reducer should also update the mapNums, so it will shuffle right.
> This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message