hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject Re: How do hadoop work in details
Date Wed, 12 Jan 2011 17:40:36 GMT
Arun,

The information is very helpful.  What scheduler do you suggest to when we
have mixed of production and adhoc jobs are running the same time using pig
and we would like to guarantee the SLA for production task.

Thanks,

Felix

On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <acm@yahoo-inc.com> wrote:

>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
>  Hi all,
>>
>> I am trying to figure out how exactly happens inside the job.
>>
>> 1) When the jobtracker launches a task to be run, how does it impact the
>> currently running jobs if the the current running job have higher, same, or
>> lower priories using the default queue.
>>
>> 2) What if a low priority job is running that is holding all the reducer
>> slots and the mappers are halfway done and a high priority job comes in took
>> all the mappers but cannot complete but all the reducer slots are taken by
>> the low priority job?
>>
>>
> Both 1) and 2) really depends on the scheduler you are using - Default,
> FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The CapacityScheduler
> doesn't allow priorities within the same queue for precisely the same reason
> since it doesn't have preemption. I'm not sure if FairScheduler handles it.
>
>
>  3) when is mappers allocated on the slaves, and when is reducers
>> allocated.
>>
>>
> Usually, reduces are allocated only after a certain percentage of maps are
> complete (5% by default). Use mapred.reduce.slowstart.completed.maps to
> control this. Look at JobInProgress.java.
>
>
>  4)Does mappers pass all the data to reducers using RPC or they write their
>> output to HDFS and the reducers pick it up.
>>
>>
> Maps sort/combine their output and write to local-disk. The reduces then
> copy them (we call it the 'shuffle' phase) over http. The TT on which the
> map ran will serve the map's output via an embedded webserver. Look at
> ReduceTask.java and TaskTracker.MapOutputServelt.doGet.
>
>
>  5) within a job, when and where is all the io occurs.
>>
>>
> Typically input to map i.e. InputFormat and output of reduce i.e.
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
>> I know this seems to be a lot of low level questions , if you can point me
>> to the right place to look is should be enough.
>>
>> Thanks,
>>
>> Felix
>>
>>
>

Mime
View raw message