hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: When exactly is combiner invoked?
Date Wed, 27 Jan 2010 18:13:10 GMT
But be careful, since combiners may execute "zero or more times"
depending upon mysterious internal logic. Relying upon combiners to do
significant work, as some of the Mahout clustering algorithms used to
do, will bite you.


Gang Luo wrote:
> When the map function generate the intermediate result and first sent them to buffer,
the partitioning and sorting will start working and , if you specify a combiner, it will be
invoked at this time. This process is in parallel with the map function. When map function
finishes, all the spills on disk will be merged, combiners will also be invoked at this time.

> -Gang
> ----- 原始邮件 ----
> 发件人: Le Zhao <lezhao@cs.cmu.edu>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2010/1/27 (周三) 11:57:08 上午
> 主   题: When exactly is combiner invoked?
> Hi - combiner performs on a chunk of mapper output data, but what exactly is the chunk
cut off, or when exactly will the chunk be fed to the combiner?
> 1. Will it be after the mapper finishes processing an input record?
> 2. Will it be after the mapper outputs a key value pair that hits the memory limit?
> This will be important to know, because strategy 1 gives more guarantee over output record
duplicity than 2, say when an input record for the mapper can correspond to multiple output
records with the same key.
> Thanks,
> Le
>       ___________________________________________________________ 
>   好玩贺卡等你发,邮箱贺卡全新上线! 
> http://card.mail.cn.yahoo.com/

View raw message