Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of julianbui@gmail.com designates
 209.85.217.178 as permitted sender)
MIME-Version: 1.0
Date: Tue, 5 Mar 2013 03:49:24 -0800
Message-ID: 
 <CAFWc0y0v9Qev82_6Ge1Qv7J=ye_aAtByEccz419-sZpeoPVi6A@mail.gmail.com>
Subject: basic question about rack awareness and computation migration
From: Julian Bui <julianbui@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d042c6423aab34f04d72c1066

--f46d042c6423aab34f04d72c1066
Content-Type: text/plain; charset=ISO-8859-1

Hi hadoop users,

I'm trying to find out if computation migration is something the developer
needs to worry about or if it's supposed to be hidden.

I would like to use hadoop to take in a list of image paths in the hdfs and
then have each task compress these large, raw images into something much
smaller - say jpeg  files.

Input: list of paths
Output: compressed jpeg

Since I don't really need a reduce task (I'm more using hadoop for its
reliability and orchestration aspects), my mapper ought to just take the
list of image paths and then work on them.  As I understand it, each image
will likely be on multiple data nodes.

My question is how will each mapper task "migrate the computation" to the
data nodes?  I recall reading that the namenode is supposed to deal with
this.  Is it hidden from the developer?  Or as the developer, do I need to
discover where the data lies and then migrate the task to that node?  Since
my input is just a list of paths, it seems like the namenode couldn't
really do this for me.

Another question: Where can I find out more about this?  I've looked up
"rack awareness" and "computation migration" but haven't really found much
code relating to either one - leading me to believe I'm not supposed to
have to write code to deal with this.

Anyway, could someone please help me out or set me straight on this?

Thanks,
-Julian

--f46d042c6423aab34f04d72c1066
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi hadoop users,<div><br></div><div>I&#39;m trying to find out if computati=
on migration is something the developer needs to worry about or if it&#39;s=
 supposed to be hidden.</div><div><br></div><div>I would like to use hadoop=
 to take in a list of image paths in the hdfs and then have each task compr=
ess these large, raw images into something much smaller - say jpeg =A0files=
. =A0</div>
<div><br></div><div>Input: list of paths</div><div>Output: compressed jpeg<=
/div><div><br></div><div>Since I don&#39;t really need a reduce task (I&#39=
;m more using hadoop for its reliability and orchestration aspects), my map=
per ought to just take the list of image paths and then work on them. =A0As=
 I understand it, each image will likely be on multiple data nodes. =A0</di=
v>
<div><br></div><div>My question is how will each mapper task &quot;migrate =
the computation&quot; to the data nodes? =A0I recall reading that the namen=
ode is supposed to deal with this. =A0Is it hidden from the developer? =A0O=
r as the developer, do I need to discover where the data lies and then migr=
ate the task to that node? =A0Since my input is just a list of paths, it se=
ems like the namenode couldn&#39;t really do this for me.</div>
<div><br></div><div>Another question: Where can I find out more about this?=
 =A0I&#39;ve looked up &quot;rack awareness&quot; and &quot;computation mig=
ration&quot; but haven&#39;t really found much code relating to either one =
- leading me to believe I&#39;m not supposed to have to write code to deal =
with this.</div>
<div><br></div><div>Anyway, could someone please help me out or set me stra=
ight on this?</div><div><br></div><div>Thanks,</div><div>-Julian</div>

--f46d042c6423aab34f04d72c1066--