Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of tucu@cloudera.com designates
 209.85.216.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADtDQQK7bVXv9tgKn9Mph5yfRHgf+=ihd9oYgiNqcECpDJgKKA@mail.gmail.com>
References: 
 <CADtDQQ+yB-GGZKZ66zjXTMw7rO4KFkbP7ZFSbBAqWQ+tKD4oKQ@mail.gmail.com>
 <CADQoAEh+Km2d1LJ6giFiYQ-+w72bT=8GhsiAey+AoL7n=hTFPA@mail.gmail.com>
 <E0F0FEF1-629B-4A56-B4D6-A7AFB09D87A8@cloudera.com>
 <CADtDQQK7bVXv9tgKn9Mph5yfRHgf+=ihd9oYgiNqcECpDJgKKA@mail.gmail.com>
From: Alejandro Abdelnur <tucu@cloudera.com>
Date: Sun, 16 Mar 2014 23:40:08 -0700
Message-ID: 
 <CAJs-t7NY_AMDtOpjTw6UbP=MmH_H7uZ=kxmZqiiuvFqZZGdv-w@mail.gmail.com>
Subject: Re: Data Locality and WebHDFS
To: "common-user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a1133045c951f1d04f4c7b2fa

--001a1133045c951f1d04f4c7b2fa
Content-Type: text/plain; charset=ISO-8859-1

I may have expressed myself wrong. You don't need to do any test to see how
locality works with files of multiple blocks. If you are accessing a file
of more than one block over webhdfs, you only have assured locality for the
first block of the file.

Thanks.


On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowling@gmail.com> wrote:

> Thank you, Mingjiang and Alejandro.
>
> This is interesting.  Since we will use the data locality information for
> scheduling, we could "hack" this to get the data locality information, at
> least for the first block.  As Alejandro says, we'd have to test what
> happens for other data blocks -- e.g., what if, knowing the block sizes, we
> request the second or third block?
>
> Interesting food for thought!  I see some experiments in my future!
>
> Thanks!
>
>
> On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <tucu@cloudera.com>wrote:
>
>> well, this is for the first block of the file, the rest of the file
>> (blocks being local or not) are streamed out by the same datanode. for
>> small files (one block) you'll get locality, for large files only the first
>> block, and by chance if other blocks are local to that datanode.
>>
>>
>> Alejandro
>> (phone typing)
>>
>> On Mar 16, 2014, at 18:53, Mingjiang Shi <mshi@gopivotal.com> wrote:
>>
>> According to this page:
>> http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>>
>>> *Data Locality*: The file read and file write calls are redirected to
>>> the corresponding datanodes. It uses the full bandwidth of the Hadoop
>>> cluster for streaming data.
>>>
>>> *A HDFS Built-in Component*: WebHDFS is a first class built-in
>>> component of HDFS. It runs inside Namenodes and Datanodes, therefore, it
>>> can use all HDFS functionalities. It is a part of HDFS - there are no
>>> additional servers to install
>>>
>>
>> So it looks like the data locality is built-into webhdfs, client will be
>> redirected to the data node automatically.
>>
>>
>>
>>
>> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowling@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm writing up a Google Summer of Code proposal to add HDFS support to
>>> Disco, an Erlang MapReduce framework.
>>>
>>> We're interested in using WebHDFS.  I have two questions:
>>>
>>> 1) Does WebHDFS allow querying data locality information?
>>>
>>> 2) If the data locality information is known, can data on specific data
>>> nodes be accessed via Web HDFS?  Or do all Web HDFS requests have to go
>>> through a single server?
>>>
>>> Thanks,
>>> RJ
>>>
>>> --
>>> em rnowling@gmail.com
>>> c 954.496.2314
>>>
>>
>>
>>
>> --
>> Cheers
>> -MJ
>>
>>
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>


-- 
Alejandro

--001a1133045c951f1d04f4c7b2fa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I may have expressed myself wrong. You don&#39;t need to d=
o any test to see how locality works with files of multiple blocks. If you =
are accessing a file of more than one block over webhdfs, you only have ass=
ured locality for the first block of the file.<div>

<br></div><div>Thanks.</div></div><div class=3D"gmail_extra"><br><br><div c=
lass=3D"gmail_quote">On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <span dir=
=3D"ltr">&lt;<a href=3D"mailto:rnowling@gmail.com" target=3D"_blank">rnowli=
ng@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thank you, Mingjiang and Al=
ejandro.<div><br></div><div>This is interesting. &nbsp;Since we will use th=
e data locality information for scheduling, we could &quot;hack&quot; this =
to get the data locality information, at least for the first block. &nbsp;A=
s Alejandro says, we&#39;d have to test what happens for other data blocks =
-- e.g., what if, knowing the block sizes, we request the second or third b=
lock?</div>


<div><br></div><div>Interesting food for thought! &nbsp;I see some experime=
nts in my future! &nbsp;</div><div><br></div><div>Thanks!</div></div><div c=
lass=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div c=
lass=3D"gmail_quote">

On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <span dir=3D"ltr">&lt;=
<a href=3D"mailto:tucu@cloudera.com" target=3D"_blank">tucu@cloudera.com</a=
>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"auto"><div>well, this is for the=
 first block of the file, the rest of the file (blocks being local or not) =
are streamed out by the same datanode. for small files (one block) you&#39;=
ll get locality, for large files only the first block, and by chance if oth=
er blocks are local to that datanode.&nbsp;</div>


<div><br></div><div><br>Alejandro<div>(phone typing)</div></div><div><div><=
div><br>On Mar 16, 2014, at 18:53, Mingjiang Shi &lt;<a href=3D"mailto:mshi=
@gopivotal.com" target=3D"_blank">mshi@gopivotal.com</a>&gt; wrote:<br>
<br></div><blockquote type=3D"cite"><div><div dir=3D"ltr"><div class=3D"gma=
il_default" style=3D"font-size:small">According to this page: <a href=3D"ht=
tp://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/" targ=
et=3D"_blank">http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-acces=
s-to-hdfs/</a><br>


<blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex" class=3D"gmail_quote"><p><strong>Data Locality<=
/strong>: The file read and file write calls=20
are redirected to the corresponding datanodes. It uses the full=20
bandwidth of the Hadoop cluster for streaming data.</p>
<p><strong>A HDFS Built-in Component</strong>: WebHDFS is a first class=20
built-in component of HDFS. It runs inside Namenodes and Datanodes,=20
therefore, it can use all HDFS functionalities. It is a part of HDFS &ndash=
;=20
there are no additional servers to install</p></blockquote><br></div><div c=
lass=3D"gmail_default" style=3D"font-size:small">So it looks like the data =
locality is built-into webhdfs, client will be redirected to the data node =
automatically. <br>


<br><br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_=
quote">On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:rnowling@gmail.com" target=3D"_blank">rnowling@gmail.com</a=
>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi all,<div><br></div><div>=
I&#39;m writing up a Google Summer of Code proposal to add HDFS support to =
Disco, an Erlang MapReduce framework. &nbsp;</div>


<div><br></div><div>We&#39;re interested in using WebHDFS. &nbsp;I have two=
 questions:</div>
<div><br></div><div>1) Does WebHDFS allow querying data locality informatio=
n?</div><div><br></div><div>2) If the data locality information is known, c=
an data on specific data nodes be accessed via Web HDFS? &nbsp;Or do all We=
b HDFS requests have to go through a single server?</div>


<div><br>Thanks,</div><div>RJ<span><font color=3D"#888888"><br clear=3D"all=
"><div><br></div>-- <br>em <a href=3D"mailto:rnowling@gmail.com" target=3D"=
_blank">rnowling@gmail.com</a><br>c <a href=3D"tel:954.496.2314" value=3D"+=
19544962314" target=3D"_blank">954.496.2314</a>
</font></span></div></div>
</blockquote></div><br><br clear=3D"all"><br>-- <br><div dir=3D"ltr"><div>C=
heers<br></div>-MJ<br></div>
</div>
</div></blockquote></div></div></div></blockquote></div><br><br clear=3D"al=
l"><div><br></div>-- <br>em <a href=3D"mailto:rnowling@gmail.com" target=3D=
"_blank">rnowling@gmail.com</a><br>c <a href=3D"tel:954.496.2314" value=3D"=
+19544962314" target=3D"_blank">954.496.2314</a>
</div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Alejandro
</div>

--001a1133045c951f1d04f4c7b2fa--