hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amogh Vasekar" <vase...@yahoo-inc.com>
Subject combiner without reducer
Date Fri, 21 Nov 2008 05:48:46 GMT
Hi,
I believe currently a combiner is not run unless you have atleast one
reducer set. 
Not getting into the Hadoop-18 semantics of combiner running on both
sides ( the number of reducers are anyways 0, so I guess the
merge-combine doesn't come into picture at all) , I have a use case
where I would like to run a combiner without a reducer.
Basically the aggregation ( a lookup sort of thing ) I do is dependent
on a relatively small dataset, and the aggregation is independent of
records in the map input data forming the input dataset, and hence the
motivation for combine-without-reduce. 
What I wanted to do was aggregate the similar records in the combiner (
or particular instance of combiner ) in a single shot, this forming my
output. This would save me from the amount of intermediate I/O involved
in S&S phase at some partial I/O cost on the map + combine side, and I
just wanted to try it out to see if its feasible at all. 
Given combiner w/o reducer is not supported, I was thinking of doing it
in a similar way Hadoop would do : create a buffer, sort, combine as I
flush.
Any thoughts on this would be really helpful.

Thanks,
Amogh


Mime
View raw message