hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: In memory Map Reduce
Date Sun, 01 Jun 2008 17:13:30 GMT
Hadoop goes to some lengths to make sure that things can stay in memory as
much as possible.  There are still cases, however, where intermediate
results are  normally written to disk.  That means that implementors will
have those time scales in their head as they do things which will inevitably
make the trade-offs somewhat poor compared to a system that never envisions
intermediate data being written to disk.

But other than guessing like this, I couldn't actually say how it would turn
out except that for very short jobs, moving jar files around and other
startup costs can be the dominant cost.

On Sun, Jun 1, 2008 at 5:05 AM, Martin Jaggi <m.jaggi@gmail.com> wrote:

> So in the case that all intermediate pairs fit into the RAM of the cluster,
> does the InMemoryFileSystem already allow the intermediate phase to be done
> without much disk access? Or what would be the current bottleneck in Hadoop
> in this scenario (huge computational load, not so much data in/out)
> according to your opinion?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message