hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Jaggi <m.ja...@gmail.com>
Subject Re: In memory Map Reduce
Date Sun, 08 Jun 2008 20:05:12 GMT
Is there some statistics available to monitor which percentage of the  
pairs remains in memory, and which percentage was written to disk?

Or which are these exceptional cases that you mention?


> Hadoop goes to some lengths to make sure that things can stay in  
> memory as
> much as possible.  There are still cases, however, where intermediate
> results are  normally written to disk.  That means that implementors  
> will
> have those time scales in their head as they do things which will  
> inevitably
> make the trade-offs somewhat poor compared to a system that never  
> envisions
> intermediate data being written to disk.
>
> But other than guessing like this, I couldn't actually say how it  
> would turn
> out except that for very short jobs, moving jar files around and other
> startup costs can be the dominant cost.
>
> On Sun, Jun 1, 2008 at 5:05 AM, Martin Jaggi <m.jaggi@gmail.com>  
> wrote:
>
>>
>> So in the case that all intermediate pairs fit into the RAM of the  
>> cluster,
>> does the InMemoryFileSystem already allow the intermediate phase to  
>> be done
>> without much disk access? Or what would be the current bottleneck  
>> in Hadoop
>> in this scenario (huge computational load, not so much data in/out)
>> according to your opinion?


Mime
View raw message