hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vuk Ercegovac <verc...@us.ibm.com>
Subject Re: Tech Talk: Dryad
Date Fri, 09 Nov 2007 19:56:35 GMT
I can make a patch available but the prototype was developed on version
0.13 and it includes some work on joins.
I'd prefer to bring it up to the trunk version and focus only on the
composable aspect (one thing at a time...).

With regards to check-pointing, I did mention (and Stu also caught this),
that there is a possibility to store the
output of a composed operator to HDFS (for checkpointing). This would
result in the subsequent
operator (a reduce) to read its input from HDFS which differs from the
current implementation whereby reducers
read from files stored at mappers. Perhaps generalizing MapOutputLocation
would be a way to approach
this difference but the current prototype did not consider this.

In any event, I am interested to hear if there are other thoughts out there
to both simplify
the approach and to address the issue with failures. So far, the feedback
is "too complicated"
and "not scalable with regard to failures" when compared to the benefits.
The idea of writing to HDFS
within a composed job, while still saving passes over the data is an idea
that may simplify and be better at handling
failures. If there is a (reasonably) simple solution that addresses
failures (correctness and cost), would there be interest?

             "Stu Hood"                                                    
             us>                                                        To 
             11/09/2007 10:02                                           cc 
                                       Re: Tech Talk: Dryad                
             Please respond to                                             

I did read the conclusion of the previous thread, which was that nobody
"thought" that the performance gains would be worth the added complexity. I
simply think that if a patch is available, the developer should be
encouraged to submit it for review, since the topic has been discussed so

I think our concept of "the" map/reduce primitive has been limited in scope
to the capabilities that Google described. There is no reason not to
explore potentially beneficial additions (even if Google didn't think they
were worthwhile).

Yes, Dryad is more confusing, because it is using a more flexible
primitive. I'm not suggesting that Hadoop should be rewritten to use a DAG
at its core, but we do already have the o.a.h.m.jobcontrol.JobControl
module, so _somebody_ must think the concept is useful.

Re: Side note: As the presenter explained, he uses a small example first to
demonstrate the linear speedup. Next (~32 minute) he goes to an example of
sorting 10TB on 1800 machines in ~12 minutes...


-----Original Message-----
From: Owen O'Malley <oom@yahoo-inc.com>
Sent: Friday, November 9, 2007 12:32pm
To: hadoop-user@lucene.apache.org
Subject: Re: Tech Talk: Dryad

On Nov 9, 2007, at 8:49 AM, Stu Hood wrote:

> Currently there is no sanctioned method of 'piping' the reduce
> output of one job directly into the map input of another (although
> it has been discussed: see the thread I linked before: http://
> www.nabble.com/Poly-reduce--tf4313116.html ).

Did you read the conclusion of the previous thread? The performance
gains in avoiding the second map input are trivial compared the gains
in simplicity of having a single data path and re-execution story.
During a reasonably large job, roughly 98% of your maps are reading
data on the _same_ node. Once we put in rack locality, it will be
even better.

I'd much much rather build the map/reduce primitive and support it
very well than add the additional complexity of any sort of poly-
reduce. I think it is very appropriate for systems like Pig to
include that kind of optimization, but it should not be part of the
base framework.

I watched the front of the Dryad talk and was struck by how complex
it quickly became. It does give the application writer a lot of
control, but to do the equivalent of a map/reduce sort with 100k maps
and 4k reduces with automatic spill-over to disk during the shuffle
seemed _really_ complicated.

On a side note, in the part of the talk that I watched, the scaling
graph went from 2 to 9 nodes. Hadoop's scaling graphs go to 1000's of
nodes. Did they ever suggest later in the talk that it scales up higher?

-- Owen

View raw message