Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of Jan.Lukavsky@firma.seznam.cz
 designates 77.75.74.246 as permitted sender)
Message-ID: <5035F6ED.4000104@firma.seznam.cz>
Date: Thu, 23 Aug 2012 11:25:01 +0200
From: =?ISO-8859-1?Q?Jan_Lukavsk=FD?= <jan.lukavsky@firma.seznam.cz>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: <user@hadoop.apache.org>
Subject: Running map tasks after all reduces have finished
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

we are seeing strange behaviour of JobTracker in the following scenario:
  - job finishes map phase and starts reduce
  - after the shuffle phase of all reducers we loose a tasktracker, that 
doesn't run any reducer - so all remaining reducers are still running in 
the reduce phase
  - map tasks that were running on the lost tasktracker are rescheduled
  - reduces may finish earlier than the rescheduled map tasks and so the 
job is blocked waiting for the maps to finish, although their output is 
simple discarded

Is this behaviour a bug or feature? :) I haven't found any JIRA that 
would describe it, if there exists one can anyone point me out?

Thanks,
  Jan