Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7547FDDB5 for ; Tue, 5 Mar 2013 23:51:10 +0000 (UTC) Received: (qmail 90236 invoked by uid 500); 5 Mar 2013 23:51:05 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 90144 invoked by uid 500); 5 Mar 2013 23:51:05 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 90137 invoked by uid 99); 5 Mar 2013 23:51:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 23:51:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of julianbui@gmail.com designates 209.85.217.175 as permitted sender) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 23:50:58 +0000 Received: by mail-lb0-f175.google.com with SMTP id n3so5181519lbo.6 for ; Tue, 05 Mar 2013 15:50:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=P+5ouGm46OowGbA+Kn5rzXx6Bha58EXs/mzMz57bFmo=; b=axssdxMkTSzZStQM0WDrw6ChDiqVKvuYJB/O6XV8y5pkn0LAZWZlkF1kw6nCY7nZ55 +xxx/0dfSCHdE0ZympSc8Rtxg2G6/OMabd3YnVHTQmRZ/W8i1DBoECOCitIDt5GDGMs0 vZLjJPchFxoU5NSLlvvrfgdX+0RaYFRBbbh04VL6N9QUexWZlNke75cr5crwVcFaNbUJ bxOa6qrDR9VDcnV/WWDqdvQB9jsBXaLg4oVVhFMNNnqrOz9ISuh8E6U+fUzKFZAjkkSa NUFxZCIjYroUOFC8bTDI5hiEaf76+ogw51wk11ALw/fbkD5nmNo4RZc+XcCPXRFC9dww Qx2g== MIME-Version: 1.0 X-Received: by 10.152.128.98 with SMTP id nn2mr23146528lab.17.1362527437846; Tue, 05 Mar 2013 15:50:37 -0800 (PST) Received: by 10.112.76.38 with HTTP; Tue, 5 Mar 2013 15:50:37 -0800 (PST) In-Reply-To: <83C2D687-B3E3-45F3-A36A-9F1AFE1FE6AD@gmail.com> References: <83C2D687-B3E3-45F3-A36A-9F1AFE1FE6AD@gmail.com> Date: Tue, 5 Mar 2013 15:50:37 -0800 Message-ID: Subject: Re: basic question about rack awareness and computation migration From: Julian Bui To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d042c6423edfdac04d73623c6 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042c6423edfdac04d73623c6 Content-Type: text/plain; charset=ISO-8859-1 Hi Rohit, Thanks for responding. > a task can be scheduled by hadoop to be executed on the same node which is having data. In my case, the mapper won't actually know where the data resides at the time of being scheduled. It only knows what data it will be accessing when it reads in the keys. In other words, the task will be already be running by the time the mapper figures out what data must be accessed - so how can hadoop know where to execute the code? I'm still lost. Please help if you can. -Julian On Tue, Mar 5, 2013 at 11:15 AM, Rohit Kochar wrote: > Hello , > To be precise this is hidden from the developer and you need not write any > code for this. > Whenever any file is stored in HDFS than it is splitted into block size of > configured size and each block could potentially be stored on different > datanode.All this information of which file contains which blocks resides > with the namenode. > > So essentially whenever a file is accessed via DFS Client it requests the > NameNode for metadata, > which DFS client uses to provide the file in streaming fashion to enduser. > > Since namenode knows the location of all the blocks/files ,a task can be > scheduled by hadoop to be executed on the same node which is having data. > > Thanks > Rohit Kochar > > On 05-Mar-2013, at 5:19 PM, Julian Bui wrote: > > > Hi hadoop users, > > > > I'm trying to find out if computation migration is something the > developer needs to worry about or if it's supposed to be hidden. > > > > I would like to use hadoop to take in a list of image paths in the hdfs > and then have each task compress these large, raw images into something > much smaller - say jpeg files. > > > > Input: list of paths > > Output: compressed jpeg > > > > Since I don't really need a reduce task (I'm more using hadoop for its > reliability and orchestration aspects), my mapper ought to just take the > list of image paths and then work on them. As I understand it, each image > will likely be on multiple data nodes. > > > > My question is how will each mapper task "migrate the computation" to > the data nodes? I recall reading that the namenode is supposed to deal > with this. Is it hidden from the developer? Or as the developer, do I > need to discover where the data lies and then migrate the task to that > node? Since my input is just a list of paths, it seems like the namenode > couldn't really do this for me. > > > > Another question: Where can I find out more about this? I've looked up > "rack awareness" and "computation migration" but haven't really found much > code relating to either one - leading me to believe I'm not supposed to > have to write code to deal with this. > > > > Anyway, could someone please help me out or set me straight on this? > > > > Thanks, > > -Julian > > --f46d042c6423edfdac04d73623c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Rohit,=A0

Thanks for responding.

=
>=A0a task can be scheduled by hadoop to be executed on the same no= de which is having data.

In my case, the mapper won'= t actually know where the data resides at the time of being scheduled. =A0I= t only knows what data it will be accessing when it reads in the keys. =A0I= n other words, the task will be already be running by the time the mapper f= igures out what data must be accessed - so how can hadoop know where to exe= cute the code?

I'm still lost. =A0Please help if you can.

-Julian

On Tue, Mar 5, 2= 013 at 11:15 AM, Rohit Kochar <mnit.rohit@gmail.com> wrot= e:
Hello ,
To be precise this is hidden from the developer and you need not write any = code for this.
Whenever any file is stored in HDFS than it is splitted into block size of = configured size and each block could potentially be stored on different dat= anode.All this information of which file contains which blocks resides with= the namenode.

So essentially whenever a file is accessed via DFS Client it requests the = =A0NameNode for metadata,
which DFS client uses to provide the file in streaming fashion to enduser.<= br>
Since namenode knows the location of all the blocks/files ,a task can be sc= heduled by hadoop to be executed on the same node which is having data.

Thanks
Rohit Kochar

On 05-Mar-2013, at 5:19 PM, Julian Bui wrote:

> Hi hadoop users,
>
> I'm trying to find out if computation migration is something the d= eveloper needs to worry about or if it's supposed to be hidden.
>
> I would like to use hadoop to take in a list of image paths in the hdf= s and then have each task compress these large, raw images into something m= uch smaller - say jpeg =A0files.
>
> Input: list of paths
> Output: compressed jpeg
>
> Since I don't really need a reduce task (I'm more using hadoop= for its reliability and orchestration aspects), my mapper ought to just ta= ke the list of image paths and then work on them. =A0As I understand it, ea= ch image will likely be on multiple data nodes.
>
> My question is how will each mapper task "migrate the computation= " to the data nodes? =A0I recall reading that the namenode is supposed= to deal with this. =A0Is it hidden from the developer? =A0Or as the develo= per, do I need to discover where the data lies and then migrate the task to= that node? =A0Since my input is just a list of paths, it seems like the na= menode couldn't really do this for me.
>
> Another question: Where can I find out more about this? =A0I've lo= oked up "rack awareness" and "computation migration" bu= t haven't really found much code relating to either one - leading me to= believe I'm not supposed to have to write code to deal with this.
>
> Anyway, could someone please help me out or set me straight on this? >
> Thanks,
> -Julian


--f46d042c6423edfdac04d73623c6--