hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Strohman <stroh...@cs.umass.edu>
Subject Re: Combining MapReduce implementations
Date Wed, 11 Oct 2006 17:42:29 GMT

On Oct 11, 2006, at 12:32 PM, Doug Cutting wrote:

> Yes.  We'd like Hadoop's MapReduce to be able to live on top of  
> such systems [Grid Engine].  Some are already experimenting with  
> Hadoop on Condor, but I've not yet heard of anyone using Hadoop on  
> Sun's Grid engine.
> http://issues.apache.org/jira/browse/HADOOP-428
> http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/ 
> paranjpye_yahoo_condor.ppt
> [...]  I'm also interested in supporting Amazon's EC2 [...]

That's good to hear.  For our own hardware, the critical issue is how  
we can share the resources efficiently with other people.  Right now  
there are lots of people using these machines, and I'm the only one  
using MapReduce.  Some people want to use MPI, some want to run  
standard applications that use NFS, etc.  Grid Engine almost  
completely solves this sharing problem for us.

EC2 support sounds exciting.

>> Record I/O: [ ...]
>> and my TypeBuilder class generates code for all possible orderings  
>> of this class (order by word, order by count, order by word then  
>> count, order by count then word).  Each ordering has its own hash  
>> function and comparator.
>> In addition, each ordering has its own serialization/ 
>> deserialization code.  For example, if we order by count, the  
>> serialization code stores only differences between adjacent counts  
>> to help with compression.
>> Is this code you'd be interested in?
> Yes, this sounds very interesting.  Does it build on the Record IO  
> classes or is it completely separate?

I'm afraid it's completely separate, although it's not much code.   
The TypeBuilder is ~600 lines of code right now, plus maybe 500 lines  
of additional support (compression classes, etc.).

It can't be considered a drop-in replacement for the record stuff-- 
you've already got C++ support and complex record types.  I don't  
know if it even makes sense to try to integrate the code I have, or  
if it should just serve as a proof of concept for a feature.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message