hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Combining MapReduce implementations
Date Mon, 16 Oct 2006 17:43:14 GMT
The usage scenario I imagine is that one allocates all the hosts in a 
cluster, runs some mapreduce computations, then de-allocates all hosts. 
  I didn't think one would normally re-size the cluster in the middle of 
a computation.  Why would you want to do that?


Lee wrote:
> I was also contemplating EC2 in regards to Hadoop.  One of the issues I was
> thinking of was, assuming you are dynamically allocating and deallocating
> hosts, would you need to be careful of how fast you released hosts?  Is
> there currently any graceful way of letting Hadoop deal with removing a
> host? I.E letting any mapreduce tasks finish and moving chunks to another
> box.
> Lee
> On 10/11/06, Doug Cutting <cutting@apache.org> wrote:
>> Trevor Strohman wrote:
>> > Grid Engine: All the machines available to me run Sun's Grid Engine for
>> > job submission.  Grid Engine is important for us, because it makes sure
>> > that all of the users of a cluster get their fair share of 
>> resources--as
>> > far as I can tell, the JobTracker assumes that one user owns the
>> > machines.  Is this shared scenario you're interested in supporting?
>> Yes.  We'd like Hadoop's MapReduce to be able to live on top of such
>> systems.  Some are already experimenting with Hadoop on Condor, but I've
>> not yet heard of anyone using Hadoop on Sun's Grid engine.
>> http://issues.apache.org/jira/browse/HADOOP-428
>> http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/paranjpye_yahoo_condor.ppt

>> > Would you consider supporting job submission systems like Grid 
>> Engine or
>> > Condor?
>> Definitely.  I'm also interested in supporting Amazon's EC2, since it
>> removes the need of purchasing and maintaining a cluster.  In
>> particular, Amazon's prices seem, for many applications, to be
>> considerably cheaper than operating one's own cluster.
>> > Record I/O: [ ...]
>> > and my TypeBuilder class generates code for all possible orderings of
>> > this class (order by word, order by count, order by word then count,
>> > order by count then word).  Each ordering has its own hash function and
>> > comparator.
>> >
>> > In addition, each ordering has its own serialization/deserialization
>> > code.  For example, if we order by count, the serialization code stores
>> > only differences between adjacent counts to help with compression.
>> >
>> > Is this code you'd be interested in?
>> Yes, this sounds very interesting.  Does it build on the Record IO
>> classes or is it completely separate?
>> Thanks,
>> Doug

View raw message