hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Futrelle" <bob.futre...@gmail.com>
Subject Re: Comparing Hadoop to Apple Xgrid?
Date Wed, 05 Dec 2007 10:41:27 GMT
For the record, here's the Apple Xgrid (hype?) page:


 - Bob

On Dec 5, 2007 5:04 AM, Bob Futrelle <bob.futrelle@gmail.com> wrote:
> You've written a spirited statement about the strengths of hadoop.
> But I'd still be interested in hearing from someone who might
> understand why an Xgrid cluster with its attendant.management system
> would or would not be equally good for these problems. After all,
> there are a reasonable number of Xgrid customers who are getting their
> work done.
> Maybe I'll need to learn more about both and also engage in some
> discussions with the Xgrid community. I do intend to bring up the
> Xgrid system on our cluster to see how it works for us.  That'll
> certainly deepen my understanding of both.
> Thanks for the detailed reply.
>  - Bob
> On Dec 5, 2007 12:17 AM, Ted Dunning <tdunning@veoh.com> wrote:
> >
> > IF you are looking at large numbers of independent images then hadoop should
> > be close to perfect for this analysis (the problem is embarrassingly
> > parallel).  If you are looking at video, then you can still do quite well by
> > building what is essentially a probabilistic list of recognized items in the
> > video stream in the map phase, giving all frames from a single shot the same
> > reduce key.  Then in the reduce phase, you can correlate the possible
> > objects and their probabilities according to object persistence models.  It
> > would be good to do another pass after that to do scene to scene
> > correlations.  This formulation gives you near perfect parallelism as well.
> >
> > For NLP, the problem at the level of phrasal analysis can also be made
> > trivially parallel if you have large numbers of documents.  Again, you may
> > need to do a secondary pass to find duplicated references across multiple
> > documents but this is usually far less intensive than the original analysis.
> >
> > Standard scientific HPC architectures are all about facilitating arbitrary
> > communication patterns and process boundaries.  This is exceedingly hard to
> > do really well and few systems attain really good performance.  Hadoop is
> > all about working with a really simple primitive that is so simple that it
> > can be implemented really well with simple and cheap hardware.  What is
> > surprising (a bit) is that so many problems can be well expressed as
> > map-reduce programs.  Sometimes this is only true at really large scale
> > where correlations become small (allowing the map phase to do useful work on
> > many sub-units), sometimes it requires relatively large intermediate data
> > (such as many graph algorithms).  The fact is, however, that it works
> > remarkably well.
> >
> >
> > On 12/4/07 7:12 PM, "Bob Futrelle" <bob.futrelle@gmail.com> wrote:
> >
> > > For us, we want to do pattern recognition, turning
> > > raster images into collections of the objects we discover in the
> > > images. Another focus for us is NLP, esp. phrasal analysis.
> >
> >

View raw message