hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry" <dmitrytka...@hotmail.com>
Subject Re: Tech Talk: Dryad
Date Sat, 10 Nov 2007 02:30:35 GMT
I am not sure that understnad the issue, but its very interesting to get 
what you are talking about. Could you please give some more description of 
the problem.
----- Original Message ----- 
From: "Doug Cutting" <cutting@apache.org>
To: <hadoop-user@lucene.apache.org>
Sent: Friday, November 09, 2007 8:20 AM
Subject: Re: Tech Talk: Dryad

> Stu Hood wrote:
>> The slide comparing the time taken to spill to disk between vertices vs 
>> operating purely in memory (around minute 26) is definitely something to 
>> think about.
> I have not had a chance to watch the video yet, but, in MapReduce, if the 
> intermediate dataset is larger than the RAM on your cluster, then you must 
> spill to disk in order to sort.  (When it is smaller, then we should of 
> course avoid disk. but that's not the typical case.)  If you don't sort, 
> then it's just map, and piping a sequence of maps together is trivial to 
> do on the same host, no need to even move the data over the wire.  So I 
> don't yet see the direct relevance.  What am I missing? (Maybe I should 
> watch the video...)
> Doug

View raw message