Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E9406D297 for ; Tue, 5 Mar 2013 11:49:58 +0000 (UTC) Received: (qmail 77358 invoked by uid 500); 5 Mar 2013 11:49:53 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 77250 invoked by uid 500); 5 Mar 2013 11:49:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 77226 invoked by uid 99); 5 Mar 2013 11:49:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 11:49:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of julianbui@gmail.com designates 209.85.217.178 as permitted sender) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 11:49:45 +0000 Received: by mail-lb0-f178.google.com with SMTP id n1so4758990lba.37 for ; Tue, 05 Mar 2013 03:49:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=dvAkwMTju3Bn88ikhdFgePPLudg25Ron1Uj13MnpwKI=; b=FpTJt+SekY56XS2ZK6MSzloS1TUEBLWSfnMhIHielT5aHdVW/OcPZp4kuKuLG16PSl urxoIX6OVeSt+81Hh6B9aCYD3Dh0ZyKoPOm6+JumvU+Mo+FN+v7n6LBKHvDeEvH51Ji6 CNVjsY7CjHfxBDen87rEgjptwuRERwyahqgHxnFeDz+SFFBH1qpP726/nQqeuPxMA2d/ Te+mfNBxPbgDPCxlM48qrSCyRhXKn0onrP+dlSsfwGaMRh3TY4QKA1tRLqhcmnXt9QvU dxgcgs/EHasBP1S0tKD1vavuT/wNLoIzUZ66zG8ZC7WrGdvTVG9Q5Lm7MsvXX/WNEaE1 ij9A== MIME-Version: 1.0 X-Received: by 10.152.128.98 with SMTP id nn2mr21164992lab.17.1362484164996; Tue, 05 Mar 2013 03:49:24 -0800 (PST) Received: by 10.112.76.38 with HTTP; Tue, 5 Mar 2013 03:49:24 -0800 (PST) Date: Tue, 5 Mar 2013 03:49:24 -0800 Message-ID: Subject: basic question about rack awareness and computation migration From: Julian Bui To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d042c6423aab34f04d72c1066 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042c6423aab34f04d72c1066 Content-Type: text/plain; charset=ISO-8859-1 Hi hadoop users, I'm trying to find out if computation migration is something the developer needs to worry about or if it's supposed to be hidden. I would like to use hadoop to take in a list of image paths in the hdfs and then have each task compress these large, raw images into something much smaller - say jpeg files. Input: list of paths Output: compressed jpeg Since I don't really need a reduce task (I'm more using hadoop for its reliability and orchestration aspects), my mapper ought to just take the list of image paths and then work on them. As I understand it, each image will likely be on multiple data nodes. My question is how will each mapper task "migrate the computation" to the data nodes? I recall reading that the namenode is supposed to deal with this. Is it hidden from the developer? Or as the developer, do I need to discover where the data lies and then migrate the task to that node? Since my input is just a list of paths, it seems like the namenode couldn't really do this for me. Another question: Where can I find out more about this? I've looked up "rack awareness" and "computation migration" but haven't really found much code relating to either one - leading me to believe I'm not supposed to have to write code to deal with this. Anyway, could someone please help me out or set me straight on this? Thanks, -Julian --f46d042c6423aab34f04d72c1066 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi hadoop users,

I'm trying to find out if computati= on migration is something the developer needs to worry about or if it's= supposed to be hidden.

I would like to use hadoop= to take in a list of image paths in the hdfs and then have each task compr= ess these large, raw images into something much smaller - say jpeg =A0files= . =A0

Input: list of paths
Output: compressed jpeg<= /div>

Since I don't really need a reduce task (I'= ;m more using hadoop for its reliability and orchestration aspects), my map= per ought to just take the list of image paths and then work on them. =A0As= I understand it, each image will likely be on multiple data nodes. =A0

My question is how will each mapper task "migrate = the computation" to the data nodes? =A0I recall reading that the namen= ode is supposed to deal with this. =A0Is it hidden from the developer? =A0O= r as the developer, do I need to discover where the data lies and then migr= ate the task to that node? =A0Since my input is just a list of paths, it se= ems like the namenode couldn't really do this for me.

Another question: Where can I find out more about this?= =A0I've looked up "rack awareness" and "computation mig= ration" but haven't really found much code relating to either one = - leading me to believe I'm not supposed to have to write code to deal = with this.

Anyway, could someone please help me out or set me stra= ight on this?

Thanks,
-Julian
--f46d042c6423aab34f04d72c1066--