Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Message-ID;
  b=oslOXHRpFEbZ/UzFPMOdhAw1hTLhedsjGU7lpJNzzd8qnnPHueY7WP3bIq64Iov5h8gWuS/FtfegiJRL0BsxJJW7ixt9FRr904VdPPL7MSuK3PDILFQonNXrJ3fsv2t6t9p7L5SeAca+o3M2DVs29KRypwJKaopM4nG1fow0E2U=;
Date: Wed, 1 Oct 2008 13:11:16 -0700 (PDT)
From: "Terrence A. Pietrondi" <tepietrondi@yahoo.com>
Subject: Re: architecture diagram
To: core-user@hadoop.apache.org
In-Reply-To: <83FFA339-ECBC-4605-9C76-F723E68FF941@yahoo-inc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <697959.95991.qm@web35705.mail.mud.yahoo.com>

So to be "distributed" in a sense, you would want to do your computation on the disconnected parts of data in the map phase I would guess?

Terrence A. Pietrondi
http://del.icio.us/tepietrondi


--- On Wed, 10/1/08, Arun C Murthy <acm@yahoo-inc.com> wrote:

> From: Arun C Murthy <acm@yahoo-inc.com>
> Subject: Re: architecture diagram
> To: core-user@hadoop.apache.org
> Date: Wednesday, October 1, 2008, 2:16 PM
> On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote:
> 
> > I am trying to plan out my map-reduce implementation
> and I have some  
> > questions of where computation should be split in
> order to take  
> > advantage of the distributed nodes.
> >
> > Looking at the architecture diagram
> (http://hadoop.apache.org/core/images/architecture.gif 
> > ), are the map boxes the major computation areas or is
> the reduce  
> > the major computation area?
> >
> 
> Usually the maps perform the 'embarrassingly
> parallel' computational  
> steps where-in each map works independently on a
> 'split' on your input  
> and the reduces perform the 'aggregate'
> computations.
> 
>  From http://hadoop.apache.org/core/ :
> 
> Hadoop implements MapReduce, using the Hadoop Distributed
> File System  
> (HDFS). MapReduce divides applications into many small
> blocks of work.  
> HDFS creates multiple replicas of data blocks for
> reliability, placing  
> them on compute nodes around the cluster. MapReduce can
> then process  
> the data where it is located.
> 
> The Hadoop Map-Reduce framework is quite good at scheduling
> your  
> 'maps' on the actual data-nodes where the
> input-blocks are present,  
> leading to i/o efficiencies...
> 
> Arun
> 
> > Thanks.
> >
> > Terrence A. Pietrondi
> >
> >
> >