hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: How to recover reducer task data on a different data node?
Date Thu, 03 Jul 2014 11:40:20 GMT
Adding to what Jungi Jeong said, if you can get your hands on the book*
Hadoop: The Definitive Guide *by Tom White, then that would help as well as
it is explains this in significant detail.


On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong <jgjeong@calab.kaist.ac.kr>

> As far as I know, map outputs are stored in the local disks where map
> tasks were executing on,
> and the paths to map outputs can be constructed using username / jobId /
> taskId (even after map tasks terminated).
> The information of map outputs (which maps are done and where they are
> located) are available JobTracker, so TaskTracker fetches it from
> JobTracker and keeps the info until a job finishes.
> The newly launched reduce task requests TaskTracker, who can create a path
> to map outputs using jobId and taskId, to transfer corresponding map
> outputs, and data will transfer via Http connection (for details, look for
> the class MapOutputServlet in TaskTracker.java).
> I hope this can answer your question.
> - Jungi
> On 3 July 2014 18:59, James Teng <tenglinxiao@outlook.com> wrote:
>> Hi,
>> thanks for your quick reply.
>> could you pls explain bit more in details? like how to get the info which
>> map nodes have to transfer data to this new reducer node. and how to
>> communicate with them to transfer the data here.
>> or via what kind of way to copy data.
>> James.
>> ------------------------------
>> Date: Thu, 3 Jul 2014 16:52:57 +0800
>> Subject: Re: How to recover reducer task data on a different data node?
>> From: sshi@gopivotal.com
>> To: user@hadoop.apache.org
>> It will start from scratch to copy all map outputs from all mapper nodes;
>> Regards,
>> *Stanley Shi,*
>> On Thu, Jul 3, 2014 at 2:28 PM, James Teng <tenglinxiao@outlook.com>
>> wrote:
>> First i would like to declare that although i am not new to hadoop, but
>> not expert on it as well.
>> i would like to consult one issue on mapreduce framework. below is the
>> description of the scenarios.
>> When one reduce task is failed on one datanode, then the job tracker will
>> try to schedule another node to set up this reduce job and  continue
>> running, my question is how to get the assigned data back on the new node?
>> when the map phase is done, the output data will be copied to the
>> respective partitioned reducer, now if the reduce is created on the a new
>> node, what kind of actions does the new node take to get all the
>> map-allocated data back.
>> thanks in advance.
>> James.
> --
> Jungi Jeong
> M.S Candidate, Computer Architecture Lab.
> Div. of Computer Science, KAIST

View raw message