Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of erlv5241@gmail.com designates
 209.85.214.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <7B0D51053A50034199FF706B2513104F09C250A1@SACEXCMBX01-PRD.hq.netapp.com>
References: 
 <7B0D51053A50034199FF706B2513104F09C250A1@SACEXCMBX01-PRD.hq.netapp.com>
From: Ling Kun <erlv5241@gmail.com>
Date: Fri, 22 Feb 2013 15:56:33 +0800
Message-ID: 
 <CAGREd1L_r2bU3KpY_fjtqU9nGWrR5Ww2hctS-2nKhrox9zoU1Q@mail.gmail.com>
Subject: Re: How to add another file system in Hadoop
To: user <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=f46d041fa0c5dd12e504d64b8856

--f46d041fa0c5dd12e504d64b8856
Content-Type: text/plain; charset=ISO-8859-1

Dear Nikhil and all,
    Your question is a bit complex to answer, and since I am not expert of
Hadoop currently, the following answer may have some errors, any suggestion
is welcome.

1.  You MR command is issued by the client submitting a job to JobTracker
of the Hadoop cluster.
2. The JobTracker will split the input file (Usually according to the
blocksize of the underlying DFS), and then the jobtracker will have a
number of map task and reduce task, usually  each Map Task will eat one
block, and write down some intermediate data.
3. JobTracker will schedule these tasks to different TaskTrackers according
to the block location in the DFS. The block is the one which the map task
will eat. If unfortunately the map task can not assign to the TaskTracker
which have the block stored. The data of the block will be transferred to
the node which the task will run ( This is done in the underlying DFS
object, and this is where *getFileBlockLocations* take effect,  and the MR
framework will not realize it)

4.So, you see, your client will not collect all remote data to local, it
only submit a job, tell the JobTracker: how to split the input file, how to
do map, how to combine the intermediate data,  how to do reduce, where the
input file is in the DFS, and where to output the data in DFS.


Maybe you should search for some blog post, or refer to  the <Hadoop: The
definitive guide> written by Tom White for more authoritative answer.

yours,
Ling Kun


On Fri, Feb 22, 2013 at 1:05 PM, Agarwal, Nikhil
<Nikhil.Agarwal@netapp.com>wrote:

>  Hi All,****
>
> ** **
>
> Thanks a lot for taking out your time to answer my question.****
>
> ** **
>
> Ling, thank you for directing me to glusterfs. I can surely take lot of
> help from that but what I wanted to know is that in README.txt it is
> mentioned :****
>
> ** **
>
> >> # ./bin/start-mapred.sh****
>
>   If the map/reduce job/task trackers are up, all I/O will be done to
> GlusterFS.****
>
> ** **
>
> So, suppose my input files are scattered in different nodes(glusterfs
> servers), how do I(hadoop client having glusterfs plugged in) issue a
> Mapreduce command?****
>
> Moreover, after issuing a Mapreduce command would my hadoop client fetch
> all the data from different servers to my local machine and then do a
> Mapreduce or would it start the TaskTracker daemons on the machine(s) where
> the input file(s) are located and perform a Mapreduce there?****
>
> Please rectify me if I am wrong but I suppose that the location of input
> files top Mapreduce is being returned by the function *
> getFileBlockLocations* *(*FileStatus file*,* *long* start*,* *long* len*).
> *****
>
> ** **
>
> Thank you very much for your time and helping me out.****
>
> ** **
>
> Regards,****
>
> Nikhil****
>
> ** **
>
> *From:* Agarwal, Nikhil
> *Sent:* Thursday, February 21, 2013 4:19 PM
> *To:* 'user@hadoop.apache.org'
> *Subject:* How to add another file system in Hadoop****
>
> ** **
>
> Hi,****
>
> ** **
>
> I am planning to add a file system called CDMI under org.apache.hadoop.fs
> in Hadoop, something similar to KFS or S3 which are already there under
> org.apache.hadoop.fs. I wanted to ask that say, I write my file system for
> CDMI and add the package under fs but then how do I tell the core-site.xml
> or other configuration files to use CDMI file system. Where all do I need
> to make changes to enable CDMI file system become a part of Hadoop ?****
>
> ** **
>
> Thanks a lot in advance.****
>
> ** **
>
> Regards,****
>
> Nikhil
>
> --
> http://www.lingcc.com
>

--f46d041fa0c5dd12e504d64b8856
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div>Dear Nikhil and al=
l,<br></div>=A0=A0=A0
 Your question is a bit complex to answer, and since I am not expert of=20
Hadoop currently, the following answer may have some errors, any=20
suggestion is welcome.<br>
<br></div></div>1.=A0 You MR command is issued by the client submitting a j=
ob to JobTracker of the Hadoop cluster.<br></div>2.
 The JobTracker will split the input file (Usually according to the=20
blocksize of the underlying DFS), and then the jobtracker will have a=20
number of map task and reduce task, usually=A0 each Map Task will eat one=
=20
block, and write down some intermediate data.<br>
</div>3. JobTracker will schedule these tasks to different TaskTrackers=20
according to the block location in the DFS. The block is the one which=20
the map task will eat. If unfortunately the map task can not assign to=20
the TaskTracker which have the block stored. The data of the block will=20
be transferred to the node which the task will run ( This is done in the
 underlying DFS object, and this is where <span style=3D"font-size:9pt;font=
-family:Consolas;color:rgb(51,51,51)"></span><span><b><span style=3D"font-s=
ize:9pt;font-family:Consolas;color:rgb(153,0,0);border:1pt none windowtext;=
padding:0in;background:none repeat scroll 0% 0% white">getFileBlockLocation=
s</span></b></span><span style=3D"font-size:9pt;font-family:Consolas;color:=
rgb(51,51,51);background:none repeat scroll 0% 0% white"> take effect,</spa=
n>=A0 and the MR framework will not realize it) <br>


<br></div>4.So, you see, your client will not collect all remote data to
 local, it only submit a job, tell the JobTracker: how to split the=20
input file, how to do map, how to combine the intermediate data,=A0 how to
 do reduce, where the input file is in the DFS, and where to output the=20
data in DFS.<br>
<br><br></div><div>Maybe you should search for some blog post, or refer=20
to=A0 the &lt;Hadoop: The definitive guide&gt; written by Tom White for=20
more authoritative answer.<br></div><div><br></div>yours,<br></div>Ling Kun=
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri,=
 Feb 22, 2013 at 1:05 PM, Agarwal, Nikhil <span dir=3D"ltr">&lt;<a href=3D"=
mailto:Nikhil.Agarwal@netapp.com" target=3D"_blank">Nikhil.Agarwal@netapp.c=
om</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div link=3D"blue" vlink=3D"purple" lang=3D"EN-US">
<div>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Hi All,<u></u><u></u><=
/span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Thanks a lot for takin=
g out your time to answer my question.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Ling, thank you for di=
recting me to glusterfs. I can surely take lot of help from that but what I=
 wanted to know is that in README.txt it is mentioned :<u></u><u></u></span=
></p>


<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<pre style=3D"line-height:12.0pt"><span style=3D"font-family:&quot;Calibri&=
quot;,&quot;sans-serif&quot;;color:#1f497d">&gt;&gt; </span><span style=3D"=
font-size:9.0pt;font-family:Consolas;color:#333333"># ./bin/start-mapred.sh=
<u></u><u></u></span></pre>


<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333">=A0=A0If the map/reduce job/tas=
k trackers are up, all I/O will be done to GlusterFS.<u></u><u></u></span><=
/p>
<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333"><u></u>=A0<u></u></span></p>
<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333">So, suppose my input files are =
scattered in different nodes(glusterfs servers), how do I(hadoop client hav=
ing glusterfs plugged in) issue a Mapreduce
 command?<u></u><u></u></span></p>
<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333">Moreover, after issuing a Mapre=
duce command would my hadoop client fetch all the data from different serve=
rs to my local machine and then do a Mapreduce
 or would it start the TaskTracker daemons on the machine(s) where the inpu=
t file(s) are located and perform a Mapreduce there?<u></u><u></u></span></=
p>
<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333">Please rectify me if I am wrong=
 but I suppose that the location of input files top Mapreduce is being retu=
rned by the function
</span><span><b><span style=3D"font-size:9.0pt;font-family:Consolas;color:#=
990000;border:none windowtext 1.0pt;padding:0in;background:white">getFileBl=
ockLocations</span></b></span><span style=3D"font-size:9.0pt;font-family:Co=
nsolas;color:#333333;background:white">
<span><b><span style=3D"border:none windowtext 1.0pt;padding:0in">(</span><=
/b></span><span><span style=3D"border:none windowtext 1.0pt;padding:0in">Fi=
leStatus</span></span>
<span><span style=3D"border:none windowtext 1.0pt;padding:0in">file</span><=
/span><span><b><span style=3D"border:none windowtext 1.0pt;padding:0in">,</=
span></b></span>
</span><span><b><span style=3D"font-size:9.0pt;font-family:Consolas;color:#=
445588;border:none windowtext 1.0pt;padding:0in;background:white">long</spa=
n></b></span><span style=3D"font-size:9.0pt;font-family:Consolas;color:#333=
333;background:white">
<span><span style=3D"border:none windowtext 1.0pt;padding:0in">start</span>=
</span><span><b><span style=3D"border:none windowtext 1.0pt;padding:0in">,<=
/span></b></span>
</span><span><b><span style=3D"font-size:9.0pt;font-family:Consolas;color:#=
445588;border:none windowtext 1.0pt;padding:0in;background:white">long</spa=
n></b></span><span style=3D"font-size:9.0pt;font-family:Consolas;color:#333=
333;background:white">
<span><span style=3D"border:none windowtext 1.0pt;padding:0in">len</span></=
span><span><b><span style=3D"border:none windowtext 1.0pt;padding:0in">).
</span></b></span></span><span style=3D"font-size:9.0pt;font-family:Consola=
s;color:#333333"><u></u><u></u></span></p>
<p class=3D"MsoNormal" style=3D"line-height:12.0pt"><span style=3D"font-siz=
e:9.0pt;font-family:Consolas;color:#333333"><u></u>=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Thank you very much fo=
r your time and helping me out.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Regards,<u></u><u></u>=
</span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Nikhil<u></u><u></u></=
span></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d"><u></u>=A0<u></u></spa=
n></p>
<div>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Agarwal,=
 Nikhil
<br>
<b>Sent:</b> Thursday, February 21, 2013 4:19 PM<br>
<b>To:</b> &#39;<a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank"=
>user@hadoop.apache.org</a>&#39;<br>
<b>Subject:</b> How to add another file system in Hadoop<u></u><u></u></spa=
n></p>
</div>
</div><div class=3D"im">
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">Hi,<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">I am planning to add a file system called CDMI under=
 org.apache.hadoop.fs in Hadoop, something similar to KFS or S3 which are a=
lready there under org.apache.hadoop.fs. I wanted to ask that say, I write =
my file system for CDMI and add the
 package under fs but then how do I tell the core-site.xml or other configu=
ration files to use CDMI file system. Where all do I need to make changes t=
o enable CDMI file system become a part of Hadoop ?<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">Thanks a lot in advance.<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">Regards,<u></u><u></u></p>
<p class=3D"MsoNormal">Nikhil<br clear=3D"all"><br>-- <br><a href=3D"http:/=
/www.lingcc.com">http://www.lingcc.com</a>

</p></div></div></div></blockquote></div></div>

--f46d041fa0c5dd12e504d64b8856--