hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Becker <_martinbec...@web.de>
Subject Re: Reduce Task Priority / Scheduler
Date Mon, 20 Dec 2010 09:07:20 GMT
I just reread my first post. Maybe I was not clear enough:
It is only important to me that the Reduce tasks _start_ in a
specified order based on their key. That is the only additional
constraint I need.

On Mon, Dec 20, 2010 at 9:51 AM, Martin Becker <_martinbecker@web.de> wrote:
> As far as I understood, MapReduce is waiting for all Mappers to finish
> until it starts running Reduce tasks. Am I mistaken here? If I am not,
> then I do not see any more synchrony being introduced than there
> already is (no locks required). Of course I am not aware of all the
> internals, but MapReduce is working with a single JobTracker, which
> distributes Reduce tasks to the different nodes (see
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Overview).
> So the only point where my "theory" would break is, if Reducer start
> before Mappers finish. Otherwise the JobTracker should be able to
> schedule Reduce tasks in a specific order.
>
> On Mon, Dec 20, 2010 at 4:45 AM, Harsh J <qwertymaniac@gmail.com> wrote:
>> You could use sort of a distributed lock service to achieve this
>> (ZooKeeper can help). But such things ought to be avoided as David
>> pointed out above.
>>
>> On Sun, Dec 19, 2010 at 9:09 PM, Martin Becker <_martinbecker@web.de> wrote:
>>> Hello everybody,
>>>
>>> is there a possibility to make sure that certain/all reduce tasks,
>>> i.e. the reducers to certain keys, are executed in a specified order?
>>> This is Job internal, so the Job Scheduler is probably the wrong place to start?
>>> Does the order induced by the Comparable interface influence the
>>> execution order at all?
>>>
>>> Thanks in advance,
>>> Martin
>>>
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>

Mime
View raw message