hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Futrelle" <bob.futre...@gmail.com>
Subject Re: Comparing Hadoop to Apple Xgrid?
Date Wed, 05 Dec 2007 17:59:54 GMT
All this feedback is informative and valuable -- Thanks!

 - Bob Futrelle
   Northeastern U.


On Dec 5, 2007 12:55 PM, Ted Dunning <tdunning@veoh.com> wrote:
>
> I just read the xgrid page and it is clear that apple has pushed on the
> following parameters (they may be doing lots of other cool stuff that I
> don't know about):
>
> A) auto-configuration
> B) wider distribution of computation
> C) local checkpointing of processes for restarts
>
> What they have apparently not done includes
>
> X) map/reduce
> Y) magic process restarts in the face of failure (see map/reduce)
> Z) distributed file system
>
> When newbies try to run hadoop the ALWAYS seem to run head-long into the
> lack of (A) (how many times has somebody essentially said "I have a totally
> screwed up DNS and hadoop won't run"?).
>
> Item (B) is probably a bad thing for hadoop given the bandwidth required for
> the shuffle phase.
>
> Item (C) is inherent in map-reduce and is pretty neutral either way.
>
>
>
>
>
> On 12/5/07 9:23 AM, "Ted Dunning" <tdunning@veoh.com> wrote:
>
> >
> >
> > Sorry about not addressing this. (and I appreciate your gentle prod)
> >
> > The Xgrid would likely work well on these problems.  They are, after all,
> > nearly trivial to parallelize because of clean communication patterns.
> >
> > Consider an alternative problem of solving n-body gravitational dynamics for
> > n > 10^6 bodies.  Here there is nearly universal communication.
> >
> > As another example, last week I heard from some Sun engineers that one of
> > their HPC systems had to satisfy a requirement for checkpointing large
> > numerical computations in which a large number of computational nodes were
> > required to dump 10's of TB of checkpoint data to disk in less than 10
> > seconds.
> >
> > Finally, many of these HPC systems are designed to fit the entire working
> > set into memory so that high numerical computational throughput can be
> > maintained.  In this regime, communications have to work on memory
> > time-scales rather than disk time-scales.
> >
> > None of these three example problems are very suitable for Hadoop.
> >
> > The sample problems you gave are a different matter.
> >
> >
> > On 12/5/07 2:04 AM, "Bob Futrelle" <bob.futrelle@gmail.com> wrote:
> >
> >> why an Xgrid cluster with its attendant.management system
> >> would or would not be equally good for these problems
> >
>
>

Mime
View raw message