hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Creswell <dan.cresw...@lonecrusader.co.uk>
Subject Re: JavaSpaces (Blitz?) and hadoop - comparison?
Date Fri, 02 Mar 2007 08:50:55 GMT
Nigel Daley wrote:
> One more difference...
>
> Being that JavaSpaces is a Jini service, its host/port can be
> dynamically discovered (and rediscovered else where if it fails) at
> run time by clients.
> OTOH, Hadoop servers and clients are currently pre-configured with
> necessary host/ports.
>
Indeed - I was toying with doing something about removing this
pre-configuration - worthwhile?

Dan.

> Nige
>
> On Mar 1, 2007, at 6:00 AM, Dan Creswell wrote:
>
>> Tom White wrote:
>>> Good question. JavaSpaces is actually very general, so I would ask how
>>> the Replicated Worker Pattern, which is fits nicely with JavaSpaces
>>> (http://today.java.net/pub/a/today/2005/04/21/farm.html,
>>> https://computeserver.dev.java.net/) compares with Hadoop and
>>> MapReduce.
>>>
>>> My take is that at a high level JavaSpaces RWP is good for
>>> distributing jobs that don't operate on large datasets whereas Hadoop
>>> (MapReduce) is good for operating on very large datasets. JavaSpaces
>>> doesn't really have mechanisms for distributing large quantities of
>>> data in the way that HDFS does. On the other hand, JavaSpaces is good
>> Indeed JavaSpaces doesn't provide support for shipping around large
>> quantities of data however I wouldn't usually pass this kind of data
>> through the JavaSpace, I pass a reference to those chunks of data.
>>
>> This reference can be to some filesystem or another, an http or ftp
>> URI etc.
>>
>> In loose terms I'd use the JavaSpace to co-ordinate the MapReduce effort
>> whilst having some other infrastructure element handle the
>> access/distribution of data (which I guess could be HDFS?)
>>
>> I'm not sure if, under the covers, Hadoop doesn't have a similar
>> division of responsibility for co-ordination and data-distribution?
>>> for sharing modest sized data objects - with MapReduce you are sharing
>>> data, so you have to think carefully how to encode the data that the
>>> map and reduce tasks operate on. JavaSpaces in general allows you a
>>> richer computational model (compared to MapReduce) - but this
>>> generality comes at the price of being able to perform well for
>>> certain classes of application.
>>>
>> Based on what I say above, I'd refine this statement and say if you want
>> to use JavaSpaces alone to solve the entire problem it may not perform
>> well but if you use JavaSpaces as part of a complete solution it will
>> perform pretty well.
>>> Put another way: JavaSpaces RWP is a good fit for writing a program to
>>> calculate if a large number is prime (since the subtasks don't need to
>>> use much intermediate data), whereas Hadoop MapReduce is a good fit
>>> for counting web server access hits by host (since the object is to
>>> analyse a large set of data).
>>>
>>> (I think they're both great pieces of technology BTW.)
>>>
>> Me too!
>>
>> Dan.
>>
>>> Tom
>>>
>>> On 01/03/07, Dan Creswell <dan.creswell@lonecrusader.co.uk> wrote:
>>>> Hi,
>>>>
>>>> I'm the author of Blitz as it happens :)
>>>>
>>>> Blitz has various different modes of operation.  It can be persistent
>>>> but also can operate in "memory-only" configuration.
>>>>
>>>> The basic difference would be that Blitz is just a core element around
>>>> which you could build a MapReduce implementation (in fact I have
>>>> done in
>>>> the past) etc. Hadoop by contrast has a core of it's own based on a
>>>> GFS
>>>> equivalent and also includes the framework for MapReduce etc.
>>>>
>>>> In conclusion, Blitz provides the base of a stack and Hadoop has that
>>>> base (in another form) plus additional layers (MapReduce etc).
>>>>
>>>> Hope that helps,
>>>>
>>>> Dan.
>>>>
>>>> Tomi N/A wrote:
>>>>> I just came across a technology which sounded interesting, but
>>>>> doesn't
>>>>> seem very wide spread called JavaSpaces. An implementation is Blitz
>>>>> (http://www.dancres.org/blitz/).
>>>>>
>>>>>> From what I see, it seems to be a distributed computation and
>>>>> persistence engine so it made me wonder if anyone on this list would
>>>>> know anything about it and, maybe, compare the two technologies.
>>>>> Well?
>>>>> :)
>>>>>
>>>>> Cheers,
>>>>> t.n.a.
>>>>>
>>>>
>>>>
>>>
>>
>
>


Mime
View raw message