From hadoop-dev-return-8510-apmail-lucene-hadoop-dev-archive=lucene.apache.org@lucene.apache.org Thu Mar 01 14:00:55 2007 Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 96397 invoked from network); 1 Mar 2007 14:00:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2007 14:00:54 -0000 Received: (qmail 49056 invoked by uid 500); 1 Mar 2007 14:01:02 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 49031 invoked by uid 500); 1 Mar 2007 14:01:01 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 49022 invoked by uid 99); 1 Mar 2007 14:01:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Mar 2007 06:01:01 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [212.227.126.188] (HELO moutng.kundenserver.de) (212.227.126.188) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Mar 2007 06:00:50 -0800 Received: from [80.177.118.55] (helo=[192.168.0.10]) by mrelayeu.kundenserver.de (node=mrelayeu8) with ESMTP (Nemesis), id 0ML31I-1HMlpO2HO8-0000NP; Thu, 01 Mar 2007 15:00:28 +0100 Message-ID: <45E6DC68.4030200@lonecrusader.co.uk> Date: Thu, 01 Mar 2007 14:00:08 +0000 From: Dan Creswell User-Agent: Thunderbird 1.5.0.9 (Macintosh/20061207) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: JavaSpaces (Blitz?) and hadoop - comparison? References: <45E69FA9.9010604@lonecrusader.co.uk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: kundenserver.de abuse@kundenserver.de login:31a034b65fe339d4189cf8cc858989e4 X-Provags-ID2: V01U2FsdGVkX1/XJI/6bEJu0b5pWWc+9btXwJWQWhJTeZ2Sy3w 4Hs/KClKWBiZ9beUDkTTmTWDCOKgE4FappbCitoInqvpzoUFoD 3snmeVFwGCvDueKRVU80ABGwA31GJef X-Virus-Checked: Checked by ClamAV on apache.org Tom White wrote: > Good question. JavaSpaces is actually very general, so I would ask how > the Replicated Worker Pattern, which is fits nicely with JavaSpaces > (http://today.java.net/pub/a/today/2005/04/21/farm.html, > https://computeserver.dev.java.net/) compares with Hadoop and > MapReduce. > > My take is that at a high level JavaSpaces RWP is good for > distributing jobs that don't operate on large datasets whereas Hadoop > (MapReduce) is good for operating on very large datasets. JavaSpaces > doesn't really have mechanisms for distributing large quantities of > data in the way that HDFS does. On the other hand, JavaSpaces is good Indeed JavaSpaces doesn't provide support for shipping around large quantities of data however I wouldn't usually pass this kind of data through the JavaSpace, I pass a reference to those chunks of data. This reference can be to some filesystem or another, an http or ftp URI etc. In loose terms I'd use the JavaSpace to co-ordinate the MapReduce effort whilst having some other infrastructure element handle the access/distribution of data (which I guess could be HDFS?) I'm not sure if, under the covers, Hadoop doesn't have a similar division of responsibility for co-ordination and data-distribution? > for sharing modest sized data objects - with MapReduce you are sharing > data, so you have to think carefully how to encode the data that the > map and reduce tasks operate on. JavaSpaces in general allows you a > richer computational model (compared to MapReduce) - but this > generality comes at the price of being able to perform well for > certain classes of application. > Based on what I say above, I'd refine this statement and say if you want to use JavaSpaces alone to solve the entire problem it may not perform well but if you use JavaSpaces as part of a complete solution it will perform pretty well. > Put another way: JavaSpaces RWP is a good fit for writing a program to > calculate if a large number is prime (since the subtasks don't need to > use much intermediate data), whereas Hadoop MapReduce is a good fit > for counting web server access hits by host (since the object is to > analyse a large set of data). > > (I think they're both great pieces of technology BTW.) > Me too! Dan. > Tom > > On 01/03/07, Dan Creswell wrote: >> Hi, >> >> I'm the author of Blitz as it happens :) >> >> Blitz has various different modes of operation. It can be persistent >> but also can operate in "memory-only" configuration. >> >> The basic difference would be that Blitz is just a core element around >> which you could build a MapReduce implementation (in fact I have done in >> the past) etc. Hadoop by contrast has a core of it's own based on a GFS >> equivalent and also includes the framework for MapReduce etc. >> >> In conclusion, Blitz provides the base of a stack and Hadoop has that >> base (in another form) plus additional layers (MapReduce etc). >> >> Hope that helps, >> >> Dan. >> >> Tomi N/A wrote: >> > I just came across a technology which sounded interesting, but doesn't >> > seem very wide spread called JavaSpaces. An implementation is Blitz >> > (http://www.dancres.org/blitz/). >> > >> >> From what I see, it seems to be a distributed computation and >> > persistence engine so it made me wonder if anyone on this list would >> > know anything about it and, maybe, compare the two technologies. Well? >> > :) >> > >> > Cheers, >> > t.n.a. >> > >> >> >