Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
User-Agent: Microsoft-Entourage/11.3.3.061214
Date: Wed, 05 Dec 2007 09:55:05 -0800
Subject: Re: Comparing Hadoop to Apple Xgrid?
From: Ted Dunning <tdunning@veoh.com>
To: <hadoop-user@lucene.apache.org>
Message-ID: <C37C25F9.3377E%tdunning@veoh.com>
Thread-Topic: Comparing Hadoop to Apple Xgrid?
Thread-Index: Acg3Y35IvMQ6vKNWEdyGZwAWy8rVfQABHkCN
In-Reply-To: <C37C1E78.3376B%tdunning@veoh.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit


I just read the xgrid page and it is clear that apple has pushed on the
following parameters (they may be doing lots of other cool stuff that I
don't know about):

A) auto-configuration
B) wider distribution of computation
C) local checkpointing of processes for restarts

What they have apparently not done includes

X) map/reduce
Y) magic process restarts in the face of failure (see map/reduce)
Z) distributed file system

When newbies try to run hadoop the ALWAYS seem to run head-long into the
lack of (A) (how many times has somebody essentially said "I have a totally
screwed up DNS and hadoop won't run"?).

Item (B) is probably a bad thing for hadoop given the bandwidth required for
the shuffle phase.

Item (C) is inherent in map-reduce and is pretty neutral either way.


On 12/5/07 9:23 AM, "Ted Dunning" <tdunning@veoh.com> wrote:

> 
> 
> Sorry about not addressing this. (and I appreciate your gentle prod)
> 
> The Xgrid would likely work well on these problems.  They are, after all,
> nearly trivial to parallelize because of clean communication patterns.
> 
> Consider an alternative problem of solving n-body gravitational dynamics for
> n > 10^6 bodies.  Here there is nearly universal communication.
> 
> As another example, last week I heard from some Sun engineers that one of
> their HPC systems had to satisfy a requirement for checkpointing large
> numerical computations in which a large number of computational nodes were
> required to dump 10's of TB of checkpoint data to disk in less than 10
> seconds.
> 
> Finally, many of these HPC systems are designed to fit the entire working
> set into memory so that high numerical computational throughput can be
> maintained.  In this regime, communications have to work on memory
> time-scales rather than disk time-scales.
> 
> None of these three example problems are very suitable for Hadoop.
> 
> The sample problems you gave are a different matter.
> 
> 
> On 12/5/07 2:04 AM, "Bob Futrelle" <bob.futrelle@gmail.com> wrote:
> 
>> why an Xgrid cluster with its attendant.management system
>> would or would not be equally good for these problems
>