hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: sort at reduce side
Date Wed, 03 Feb 2010 17:11:54 GMT
2010/2/3 Srigurunath Chakravarthi <sriguru@yahoo-inc.com>:
> Hi Gang,
>
>>kept in map file. If so, in order to efficiently sort the data, reducer
>>actually only read the index part of each spill (which is a map file) and
>>sort the keys, instead of reading whole records from disk and sort them.
>
>  afaik, no. Reduces always fetches map output data and not indexes (even if the data
is from the local node, where an index may be sufficient).
>
> Regards,
> Sriguru
>
>>-----Original Message-----
>>From: Gang Luo [mailto:lgpublic@yahoo.com.cn]
>>Sent: Wednesday, February 03, 2010 10:40 AM
>>To: common-user@hadoop.apache.org
>>Subject: sort at reduce side
>>
>>Hi all,
>>I want to know some more details about the sorting at the reduce side.
>>
>>The intermediate result generated at the map side is stored as map file
>>which actually consists of two sub-files, namely index file and data file.
>>The index file stores the keys and it could point to corresponding record
>>stored in the data file.  What I think is that when intermediate result
>>(even only part of it for each mapper) is shuffled to reducer, it is still
>>kept in map file. If so, in order to efficiently sort the data, reducer
>>actually only read the index part of each spill (which is a map file) and
>>sort the keys, instead of reading whole records from disk and sort them.
>>
>>Does reducer actually do as what I expect?
>>
>>-Gang
>>
>>
>>      ___________________________________________________________
>>  好玩贺卡等你发,邮箱贺卡全新上线!
>>http://card.mail.cn.yahoo.com/
>

With .20 and the TotalOrderPartioner isn't reduce side sorting
possible now? Is that support we can/should add to hive?

Mime
View raw message