hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Creswell <dan.cresw...@lonecrusader.co.uk>
Subject Re: JavaSpaces (Blitz?) and hadoop - comparison?
Date Thu, 01 Mar 2007 14:00:08 GMT
Tom White wrote:
> Good question. JavaSpaces is actually very general, so I would ask how
> the Replicated Worker Pattern, which is fits nicely with JavaSpaces
> (http://today.java.net/pub/a/today/2005/04/21/farm.html,
> https://computeserver.dev.java.net/) compares with Hadoop and
> MapReduce.
> My take is that at a high level JavaSpaces RWP is good for
> distributing jobs that don't operate on large datasets whereas Hadoop
> (MapReduce) is good for operating on very large datasets. JavaSpaces
> doesn't really have mechanisms for distributing large quantities of
> data in the way that HDFS does. On the other hand, JavaSpaces is good
Indeed JavaSpaces doesn't provide support for shipping around large
quantities of data however I wouldn't usually pass this kind of data
through the JavaSpace, I pass a reference to those chunks of data.

This reference can be to some filesystem or another, an http or ftp URI etc.

In loose terms I'd use the JavaSpace to co-ordinate the MapReduce effort
whilst having some other infrastructure element handle the
access/distribution of data (which I guess could be HDFS?)

I'm not sure if, under the covers, Hadoop doesn't have a similar
division of responsibility for co-ordination and data-distribution?
> for sharing modest sized data objects - with MapReduce you are sharing
> data, so you have to think carefully how to encode the data that the
> map and reduce tasks operate on. JavaSpaces in general allows you a
> richer computational model (compared to MapReduce) - but this
> generality comes at the price of being able to perform well for
> certain classes of application.
Based on what I say above, I'd refine this statement and say if you want
to use JavaSpaces alone to solve the entire problem it may not perform
well but if you use JavaSpaces as part of a complete solution it will
perform pretty well.
> Put another way: JavaSpaces RWP is a good fit for writing a program to
> calculate if a large number is prime (since the subtasks don't need to
> use much intermediate data), whereas Hadoop MapReduce is a good fit
> for counting web server access hits by host (since the object is to
> analyse a large set of data).
> (I think they're both great pieces of technology BTW.)
Me too!


> Tom
> On 01/03/07, Dan Creswell <dan.creswell@lonecrusader.co.uk> wrote:
>> Hi,
>> I'm the author of Blitz as it happens :)
>> Blitz has various different modes of operation.  It can be persistent
>> but also can operate in "memory-only" configuration.
>> The basic difference would be that Blitz is just a core element around
>> which you could build a MapReduce implementation (in fact I have done in
>> the past) etc. Hadoop by contrast has a core of it's own based on a GFS
>> equivalent and also includes the framework for MapReduce etc.
>> In conclusion, Blitz provides the base of a stack and Hadoop has that
>> base (in another form) plus additional layers (MapReduce etc).
>> Hope that helps,
>> Dan.
>> Tomi N/A wrote:
>> > I just came across a technology which sounded interesting, but doesn't
>> > seem very wide spread called JavaSpaces. An implementation is Blitz
>> > (http://www.dancres.org/blitz/).
>> >
>> >> From what I see, it seems to be a distributed computation and
>> > persistence engine so it made me wonder if anyone on this list would
>> > know anything about it and, maybe, compare the two technologies. Well?
>> > :)
>> >
>> > Cheers,
>> > t.n.a.
>> >

View raw message