Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 41487 invoked from network); 5 Dec 2007 17:55:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Dec 2007 17:55:46 -0000 Received: (qmail 54870 invoked by uid 500); 5 Dec 2007 17:55:33 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 54381 invoked by uid 500); 5 Dec 2007 17:55:32 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 54372 invoked by uid 99); 5 Dec 2007 17:55:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2007 09:55:32 -0800 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2007 17:55:11 +0000 Received: from 75.80.179.210 ([75.80.179.210]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Wed, 5 Dec 2007 17:55:13 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Wed, 05 Dec 2007 09:55:05 -0800 Subject: Re: Comparing Hadoop to Apple Xgrid? From: Ted Dunning To: Message-ID: Thread-Topic: Comparing Hadoop to Apple Xgrid? Thread-Index: Acg3Y35IvMQ6vKNWEdyGZwAWy8rVfQABHkCN In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I just read the xgrid page and it is clear that apple has pushed on the following parameters (they may be doing lots of other cool stuff that I don't know about): A) auto-configuration B) wider distribution of computation C) local checkpointing of processes for restarts What they have apparently not done includes X) map/reduce Y) magic process restarts in the face of failure (see map/reduce) Z) distributed file system When newbies try to run hadoop the ALWAYS seem to run head-long into the lack of (A) (how many times has somebody essentially said "I have a totally screwed up DNS and hadoop won't run"?). Item (B) is probably a bad thing for hadoop given the bandwidth required for the shuffle phase. Item (C) is inherent in map-reduce and is pretty neutral either way. On 12/5/07 9:23 AM, "Ted Dunning" wrote: > > > Sorry about not addressing this. (and I appreciate your gentle prod) > > The Xgrid would likely work well on these problems. They are, after all, > nearly trivial to parallelize because of clean communication patterns. > > Consider an alternative problem of solving n-body gravitational dynamics for > n > 10^6 bodies. Here there is nearly universal communication. > > As another example, last week I heard from some Sun engineers that one of > their HPC systems had to satisfy a requirement for checkpointing large > numerical computations in which a large number of computational nodes were > required to dump 10's of TB of checkpoint data to disk in less than 10 > seconds. > > Finally, many of these HPC systems are designed to fit the entire working > set into memory so that high numerical computational throughput can be > maintained. In this regime, communications have to work on memory > time-scales rather than disk time-scales. > > None of these three example problems are very suitable for Hadoop. > > The sample problems you gave are a different matter. > > > On 12/5/07 2:04 AM, "Bob Futrelle" wrote: > >> why an Xgrid cluster with its attendant.management system >> would or would not be equally good for these problems >