hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungi Jeong <jgje...@calab.kaist.ac.kr>
Subject Re: How to recover reducer task data on a different data node?
Date Thu, 03 Jul 2014 10:29:24 GMT
As far as I know, map outputs are stored in the local disks where map tasks
were executing on,
and the paths to map outputs can be constructed using username / jobId /
taskId (even after map tasks terminated).
The information of map outputs (which maps are done and where they are
located) are available JobTracker, so TaskTracker fetches it from
JobTracker and keeps the info until a job finishes.
The newly launched reduce task requests TaskTracker, who can create a path
to map outputs using jobId and taskId, to transfer corresponding map
outputs, and data will transfer via Http connection (for details, look for
the class MapOutputServlet in TaskTracker.java).

I hope this can answer your question.
- Jungi

On 3 July 2014 18:59, James Teng <tenglinxiao@outlook.com> wrote:

> Hi,
> thanks for your quick reply.
> could you pls explain bit more in details? like how to get the info which
> map nodes have to transfer data to this new reducer node. and how to
> communicate with them to transfer the data here.
> or via what kind of way to copy data.
> James.
> ------------------------------
> Date: Thu, 3 Jul 2014 16:52:57 +0800
> Subject: Re: How to recover reducer task data on a different data node?
> From: sshi@gopivotal.com
> To: user@hadoop.apache.org
> It will start from scratch to copy all map outputs from all mapper nodes;
> Regards,
> *Stanley Shi,*
> On Thu, Jul 3, 2014 at 2:28 PM, James Teng <tenglinxiao@outlook.com>
> wrote:
> First i would like to declare that although i am not new to hadoop, but
> not expert on it as well.
> i would like to consult one issue on mapreduce framework. below is the
> description of the scenarios.
> When one reduce task is failed on one datanode, then the job tracker will
> try to schedule another node to set up this reduce job and  continue
> running, my question is how to get the assigned data back on the new node?
> when the map phase is done, the output data will be copied to the
> respective partitioned reducer, now if the reduce is created on the a new
> node, what kind of actions does the new node take to get all the
> map-allocated data back.
> thanks in advance.
> James.

Jungi Jeong
M.S Candidate, Computer Architecture Lab.
Div. of Computer Science, KAIST

View raw message