hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srigurunath Chakravarthi <srig...@yahoo-inc.com>
Subject RE: sort at reduce side
Date Wed, 03 Feb 2010 05:50:08 GMT
Hi Gang,

>kept in map file. If so, in order to efficiently sort the data, reducer
>actually only read the index part of each spill (which is a map file) and
>sort the keys, instead of reading whole records from disk and sort them. 

 afaik, no. Reduces always fetches map output data and not indexes (even if the data is from
the local node, where an index may be sufficient).

Regards,
Sriguru

>-----Original Message-----
>From: Gang Luo [mailto:lgpublic@yahoo.com.cn]
>Sent: Wednesday, February 03, 2010 10:40 AM
>To: common-user@hadoop.apache.org
>Subject: sort at reduce side
>
>Hi all,
>I want to know some more details about the sorting at the reduce side.
>
>The intermediate result generated at the map side is stored as map file
>which actually consists of two sub-files, namely index file and data file.
>The index file stores the keys and it could point to corresponding record
>stored in the data file.  What I think is that when intermediate result
>(even only part of it for each mapper) is shuffled to reducer, it is still
>kept in map file. If so, in order to efficiently sort the data, reducer
>actually only read the index part of each spill (which is a map file) and
>sort the keys, instead of reading whole records from disk and sort them.
>
>Does reducer actually do as what I expect?
>
>-Gang
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
>http://card.mail.cn.yahoo.com/
Mime
View raw message