hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xing Shi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
Date Wed, 10 Feb 2010 05:20:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831873#action_12831873
] 

Xing Shi commented on MAPREDUCE-1434:
-------------------------------------

*1. TimeOut*
    The timeout is what I have forgotten. I think it can be set unlimited? or the reduce task
is set unlimited.

*2. InputFormat*
    Now, I suppose that the inputs' input format added are  the same as that the running job's
input format.

    Although, we can add different inputformat data for this job, but I didn't think this
is what our purpose to add input. 
    If it  supports different inputformat, then we may be should use different mapper to support
different inputformat.

*3. The issuse progress.*

   We achieve a demo to support the dynamic add input. And we didn't use the InputFormat to
support the dynamic add input, we use the  interaction between JobClient and JobTracker(indeed
JIP)

*submit a job*
    1) We submit a job that supports dynamic add input (mapred.dynamic.input=true)
    2) the JobTracker generate cleanup tip and setup tip use the maximumNo (Integer.Maximum,
and Integer.Maximum - 1)
    3)  JobTracker pending the maxNo map(for maps are sorted by length), and set the 
{code}dynamicAddInputStatus=DYNAMIC_ADD_INPUT_RUNNING{code},
    when the dynamicAddInputStatus is not set DYNAMIC_ADD_INPUT_DONE, the maxNo map will always
in pending phase. Here we should consider the reduce's run time with timeout.

*add input for a job*
    1) JobClient : addInput <jobId>, <inputDir>
        1.1) check whether the client can access the job and the job supports dynamic add
input
        1.2) getSplits by conf.getInputFormat().getSplits(conf, conf.getNumMapTasks());  and
write to the job's submitJobDir
        1.3) call the JobTracker  add input for the job
    2) JobTracker(indeed JIP)
        2.1) get the JIP by the jobid
        2.2) add maps for added split by new TIP
        2.3)schedule the new maps to run, without the maxNo map
    3) ReduceTask, it update the numMaps, when the shuffle will end when the shuffled mapOutput
>= numMaps. We update the numMaps through getMapCompletionEvents.

*add input done*
    1) JobClient : inputDone <jobId>
    2) JobTracker update dynamicAddInputStatus
    {code}dynamicAddInputStatus=DYNAMIC_ADD_INPUT_DONE{code}
    then the maxNo map can be scheduled

*HowToUse it ?*

Client should judge whether it should add input to run or add input done, mainly by judge
the new input size or files' num.

_do you have some suggestion?_

> Dynamic add input for one job
> -----------------------------
>
>                 Key: MAPREDUCE-1434
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>         Environment: 0.19.0
>            Reporter: Xing Shi
>
> Always we should firstly upload the data to hdfs, then we can analize the data using
hadoop mapreduce.
> Sometimes, the upload process takes long time. So if we can add input during one job,
the time can be saved.
> WHAT?
> Client:
> a) hadoop job -add-input jobId inputFormat ...
> Add the input to jobid
> b) hadoop job -add-input done
> Tell the JobTracker, the input has been prepared over.
> c) hadoop job -add-input status jobid
> Show how many input the jobid has.
> HOWTO?
> Mainly, I think we should do three things:
> 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate
the split, and submit to JobTracker.
> 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks.
Because the uploaded data will be 
> processed quickly, so it also should update the scheduler to support pending a map task
till Client tells the Job input done.
> 3. Reducer: the reducer should also update the mapNums, so it will shuffle right.
> This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message