hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo" <edlinuxg...@gmail.com>
Subject Re: Realtime Map Reduce = Supercomputing for the Masses?
Date Sun, 01 Jun 2008 14:25:22 GMT
I think that feature makes sense because starting JVM has overhead.

On Sun, Jun 1, 2008 at 4:26 AM, Christophe Taton <taton@apache.org> wrote:
> Actually Hadoop could be made more friendly to such realtime Map/Reduce
> jobs.
> For instance, we could consider running all tasks inside the task tracker
> jvm as separate threads, which could be implemented as another personality
> of the TaskRunner.
> I have been looking into this a couple of weeks ago...
> Would you be interested in such a feature?
>
> Christophe T.
>
>
> On Sun, Jun 1, 2008 at 10:08 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> Hadoop is highly optimized towards handling datasets that are much too
>> large
>> to fit into memory.  That means that there are many trade-offs that have
>> been made that make it much less useful for very short jobs or jobs that
>> would fit into memory easily.
>>
>> Multi-core implementations of map-reduce are very interesting for a number
>> of applications as are in-memory implementations for distributed
>> architectures.  I don't think that anybody really knows yet how well these
>> other implementations will play with Hadoop.  The regimes that they are
>> designed to optimize are very different in terms of data scale, number of
>> machines and networking speed.  All of these constraints drive the design
>> in
>> innumerable ways.
>>
>> On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <m.jaggi@gmail.com> wrote:
>>
>> > Concerning real-time Map Reduce within (and not only between) machines
>> > (multi-core & GPU), e.g. the Phoenix and Mars frameworks:
>> >
>> > I'm really interested in very fast Map Reduce tasks, i.e. without much
>> disk
>> > access. With the rise of multi-core systems, this could get more and more
>> > interesting, and could maybe even lead to something like 'super-computing
>> > for everyone', or is that a bit overwhelming? Anyway I was nicely
>> surprised
>> > to see the recent Phoenix (http://csl.stanford.edu/~christos/sw/phoenix/<http://csl.stanford.edu/%7Echristos/sw/phoenix/>
>> <http://csl.stanford.edu/%7Echristos/sw/phoenix/>)
>> > implementation of Map Reduce for multi-core CPUs (they won the best paper
>> > award at HPCA'07).
>> >
>> > Recently also GPU computing was in the news again, pushed by Nvidia
>> (check
>> > CUDA  http://www.nvidia.com/object/cuda_showcase.html ), and now also
>> > there a Map Reduce implementation called Mars became available:
>> > http://www.cse.ust.hk/gpuqp/Mars_tr.pdf
>> > The Mars people say a the end of their paper "We are also interested in
>> > integrating Mars into the existing Map Reduce implementations such as
>> Hadoop
>> > so that the Map Reduce framework can take the advantage of the
>> parallelism
>> > among different machines as well as the parallelism within each machine."
>> >
>> > What do you think of this, especially about the multi-core approach? Do
>> you
>> > think these needs are already served by the current InMemoryFileSystem of
>> > Hadoop or not? Are there any plans of 'integrating' one of the two above
>> > frameworks?
>> > Or would it already be done by improving the significant intermediate
>> data
>> > pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )?
>> >
>> > Any comments?
>> >
>>
>

Mime
View raw message