hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: When exactly is combiner invoked?
Date Wed, 27 Jan 2010 17:57:58 GMT
When the map function generate the intermediate result and first sent them to buffer, the partitioning
and sorting will start working and , if you specify a combiner, it will be invoked at this
time. This process is in parallel with the map function. When map function finishes, all the
spills on disk will be merged, combiners will also be invoked at this time. 


----- 原始邮件 ----
发件人: Le Zhao <lezhao@cs.cmu.edu>
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/27 (周三) 11:57:08 上午
主   题: When exactly is combiner invoked?

Hi - combiner performs on a chunk of mapper output data, but what exactly is the chunk cut
off, or when exactly will the chunk be fed to the combiner?

1. Will it be after the mapper finishes processing an input record?
2. Will it be after the mapper outputs a key value pair that hits the memory limit?

This will be important to know, because strategy 1 gives more guarantee over output record
duplicity than 2, say when an input record for the mapper can correspond to multiple output
records with the same key.



View raw message