hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: combiner statistics
Date Wed, 06 Jan 2010 05:55:32 GMT
Thanks. What I mean is, the combiner doesn't "intentionally" re-read spilled records back to
memory just to combine them. But it does happens that some records will be re-read for sort.
I think combiner should work on those records.


----- 原始邮件 ----
发件人: Ted Xu <ted.xu.ml@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/5 (周二) 8:43:53 下午
主   题: Re: combiner statistics

Hi Gang,

My understanding to this is that, the combiner has to re-read some records
> which have already been spilled to disk and combine them with those records
> which come later.

I believe the combine operation is done before map spill and after reduce
merge. Combine only occurs in the memory, instead of re-read records from

> Besides, I am not sure whether the combiner can guarantee there is only one
> record for each distinct key in each map task. Or does it just "try its
> best" to combine?

Yes, they can only "try their best".


View raw message