hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From newpant <newpant0...@gmail.com>
Subject Re: Available of Intermediate data generated by mappers
Date Wed, 13 Oct 2010 08:07:23 GMT
Hi, according to Hadoop The Definitive Guide , map will store the
intermediate output to a in-memory buffer first, and the spill it to local
disk which configured by mapred.local.dir, so from i knew, if the
intermediate data lost , only redo can fix it.

if i wrong, please correct me.

2010/9/27 Nan Zhu <zhunansjtu@gmail.com>

> Hi, all
> I'm  not sure which mail list I should send my question to, sorry for any
> inconvenience I brought
> I'm interested in that how hadoop handles the lost of intermediate data
> generated by map tasks currently, as some papers suggest,  for the
> situation
> that  the data needed by reducers are lost, we should compare the cost
> leading by redo the task and replicating the data, if redoing the task
> costs
> more, we can offer more replication of the intermediate data generated by
> map to ensure that reducers can access the data, otherwise, we just redo
> the
> corresponding map task when we detect the lost
> I'm not sure what's the strategy adopted by hadoop currently, I haven't
> find
> the code on this function, can anyone give me some suggestions?
> Thank you
> Nan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message