Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 16987 invoked from network); 1 Oct 2008 20:12:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Oct 2008 20:12:55 -0000 Received: (qmail 98309 invoked by uid 500); 1 Oct 2008 20:12:49 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 98026 invoked by uid 500); 1 Oct 2008 20:12:48 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 98015 invoked by uid 99); 1 Oct 2008 20:12:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2008 13:12:48 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.163.179.159] (HELO web35705.mail.mud.yahoo.com) (66.163.179.159) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 01 Oct 2008 20:11:44 +0000 Received: (qmail 97934 invoked by uid 60001); 1 Oct 2008 20:11:16 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Message-ID; b=oslOXHRpFEbZ/UzFPMOdhAw1hTLhedsjGU7lpJNzzd8qnnPHueY7WP3bIq64Iov5h8gWuS/FtfegiJRL0BsxJJW7ixt9FRr904VdPPL7MSuK3PDILFQonNXrJ3fsv2t6t9p7L5SeAca+o3M2DVs29KRypwJKaopM4nG1fow0E2U=; X-YMail-OSG: FRmuLsgVM1mia0j37x.5WDoqLsGdz4Ov25_lsDGD7YT2r5ygvLZVwTrxDMQfrN56BQ-- Received: from [207.93.98.10] by web35705.mail.mud.yahoo.com via HTTP; Wed, 01 Oct 2008 13:11:16 PDT X-Mailer: YahooMailWebService/0.7.218.2 Date: Wed, 1 Oct 2008 13:11:16 -0700 (PDT) From: "Terrence A. Pietrondi" Subject: Re: architecture diagram To: core-user@hadoop.apache.org In-Reply-To: <83FFA339-ECBC-4605-9C76-F723E68FF941@yahoo-inc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <697959.95991.qm@web35705.mail.mud.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org So to be "distributed" in a sense, you would want to do your computation on the disconnected parts of data in the map phase I would guess? Terrence A. Pietrondi http://del.icio.us/tepietrondi --- On Wed, 10/1/08, Arun C Murthy wrote: > From: Arun C Murthy > Subject: Re: architecture diagram > To: core-user@hadoop.apache.org > Date: Wednesday, October 1, 2008, 2:16 PM > On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote: > > > I am trying to plan out my map-reduce implementation > and I have some > > questions of where computation should be split in > order to take > > advantage of the distributed nodes. > > > > Looking at the architecture diagram > (http://hadoop.apache.org/core/images/architecture.gif > > ), are the map boxes the major computation areas or is > the reduce > > the major computation area? > > > > Usually the maps perform the 'embarrassingly > parallel' computational > steps where-in each map works independently on a > 'split' on your input > and the reduces perform the 'aggregate' > computations. > > From http://hadoop.apache.org/core/ : > > Hadoop implements MapReduce, using the Hadoop Distributed > File System > (HDFS). MapReduce divides applications into many small > blocks of work. > HDFS creates multiple replicas of data blocks for > reliability, placing > them on compute nodes around the cluster. MapReduce can > then process > the data where it is located. > > The Hadoop Map-Reduce framework is quite good at scheduling > your > 'maps' on the actual data-nodes where the > input-blocks are present, > leading to i/o efficiencies... > > Arun > > > Thanks. > > > > Terrence A. Pietrondi > > > > > >