hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: [RT] map reduce "pipelines"
Date Wed, 09 Jun 2010 18:50:39 GMT
At Yahoo, we had a framework that was similar to MapReduce called
Dreadnaught. When we were converting applications off of Dreadnaught
to Hadoop MapReduce, we considered supporting M-R-R. (Dreadnaught
imposes few restrictions on the application and could support M, M-R,
M-R-R, etc.) The problem is that supporting the retry semantics
arbitrarily far back can cause a single node failure to launch more
and more work. By putting a checkpoint after each reduce (based on the
replica count in HDFS > 1), M-R has bounded amount of rework that can
be required and relatively simple error recovery. Hadoop is better off
doing a good job at supporting MapReduce than a bad job on more
complex pipelines.

For pipelines, I'd strongly suggest using Pig or Hive that do the
cross-job optimizations for you...

-- Owen

Mime
View raw message