hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: Combiner phase question
Date Sat, 05 Dec 2009 01:09:00 GMT
The combiner runs when it is spilling the intermediate output to disk. So
the flow looks like:

in map:
  map writes into buffer
  when buffer is "full" do a quick sort, combine and write to disk
  merge sort the partial outputs from disk, combine and write to disk

in reduce:
  fetch output from maps into buffer
  when buffer is "full" do a merge sort, combine and write to disk
  merge sort the partial outputs and feed to the reduce

So you'll have as many combines in general as the framework needs to spill
to disk. It all depends on the data sizes. The 0 time case is rare, but it
is if a partition has a single value in it (because it is very very large).

-- Owen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message