hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michele Catasta" <michele.cata...@deri.org>
Subject Re: Poly-reduce?
Date Thu, 23 Aug 2007 21:46:11 GMT
Hi all,
I'm just a month into using hadoop too, and it sounds like we are all
wishing for this kind of feature.

> 2. Map tasks of the next step are streamed data directly from preceding
> reduce tasks. This is more along the lines Ted is suggesting - make
> iterative map-reduce a primitive natively supported in Hadoop. This is
> probably a better solution - but more work?

I would like basically to do the same, with a mandatory condition:
without spilling data into temporary files. Keeping in RAM all the
files that the reduce outputs would be great in my context.

Maybe the solution could be an instance of InMemoryFileSystem? Just
passing the reference from the Reduce to the next Map (using an
external daemon... that it sounds to me like the only viable pattern
to do MapReduce chaining, correct me if I'm wrong)?

Would the inramfs distributed on all the nodes?

All the working solutions will be greatly appreciated :) I was just
supposing, the truth is that I still don't have a clue about it.

-Michele Catasta

View raw message