hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Teng <tenglinx...@outlook.com>
Subject RE: How to recover reducer task data on a different data node?
Date Fri, 04 Jul 2014 02:19:30 GMT
ok, got it. thanks shahab & jingi for your helpful reply. :)

Date: Thu, 3 Jul 2014 07:40:20 -0400
Subject: Re: How to recover reducer task data on a different data node?
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

Adding to what Jungi Jeong said, if you can get your hands on the book Hadoop: The Definitive
Guide by Tom White, then that would help as well as it is explains this in significant detail.


On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong <jgjeong@calab.kaist.ac.kr> wrote:

As far as I know, map outputs are stored in the local disks where map tasks were executing

and the paths to map outputs can be constructed using username / jobId / taskId (even after
map tasks terminated).
The information of map outputs (which maps are done and where they are located) are available
JobTracker, so TaskTracker fetches it from JobTracker and keeps the info until a job finishes.
The newly launched reduce task requests TaskTracker, who can create a path to map outputs
using jobId and taskId, to transfer corresponding map outputs, and data will transfer via
Http connection (for details, look for the class MapOutputServlet in TaskTracker.java).

I hope this can answer your question.- Jungi

On 3 July 2014 18:59, James Teng <tenglinxiao@outlook.com> wrote:

Hi, thanks for your quick reply.could you pls explain bit more in details? like how to get
the info which map nodes have to transfer data to this new reducer node. and how to communicate
with them to transfer the data here.

or via what kind of way to copy data. 
Date: Thu, 3 Jul 2014 16:52:57 +0800
Subject: Re: How to recover reducer task data on a different data node?
From: sshi@gopivotal.com

To: user@hadoop.apache.org

It will start from scratch to copy all map outputs from all mapper nodes; 

Regards,Stanley Shi,

On Thu, Jul 3, 2014 at 2:28 PM, James Teng <tenglinxiao@outlook.com> wrote:

First i would like to declare that although i am not new to hadoop, but not expert on it as
well.i would like to consult one issue on mapreduce framework. below is the description of
the scenarios.

When one reduce task is failed on one datanode, then the job tracker will try to schedule
another node to set up this reduce job and  continue running, my question is how to get the
assigned data back on the new node? when the map phase is done, the output data will be copied
to the respective partitioned reducer, now if the reduce is created on the a new node, what
kind of actions does the new node take to get all the map-allocated data back.

thanks in advance.


Jungi Jeong

M.S Candidate, Computer Architecture Lab.
Div. of Computer Science, KAIST

View raw message