hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xing Shi (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1434) Dynamic add input for one job
Date Mon, 01 Feb 2010 06:21:50 GMT
Dynamic add input for one job
-----------------------------

                 Key: MAPREDUCE-1434
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
         Environment: 0.19.0
            Reporter: Xing Shi


Always we should firstly upload the data to hdfs, then we can analize the data using hadoop
mapreduce.

Sometimes, the upload process takes long time. So if we can add input during one job, the
time can be saved.

WHAT?

Client:

a) hadoop job -add-input jobId inputFormat ...
Add the input to jobid

b) hadoop job -add-input done
Tell the JobTracker, the input has been prepared over.

c) hadoop job -add-input status jobid
Show how many input the jobid has.



HOWTO?

Mainly, I think we should do three things:

1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate
the split, and submit to JobTracker.

2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks.
Because the uploaded data will be 
processed quickly, so it also should update the scheduler to support pending a map task till
Client tells the Job input done.

3. Reducer: the reducer should also update the mapNums, so it will shuffle right.

This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message