hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RJ Nowling <rnowl...@gmail.com>
Subject Re: Data Locality and WebHDFS
Date Mon, 17 Mar 2014 21:52:21 GMT
Thank you, Tsz.  That helps!


On Mon, Mar 17, 2014 at 2:30 PM, Tsz Wo Sze <szetszwo@yahoo.com> wrote:

> The file offset is considered in WebHDFS redirection.  It redirects to a
> datanode with the first block the client going to read, not the first block
> of the file.
>
> Hope it helps.
> Tsz-Wo
>
>
>   On Monday, March 17, 2014 10:09 AM, Alejandro Abdelnur <
> tucu@cloudera.com> wrote:
>
> actually, i am wrong, the webhdfs rest call has an offset.
>
> Alejandro
> (phone typing)
>
> On Mar 17, 2014, at 10:07, Alejandro Abdelnur <tucu@cloudera.com> wrote:
>
> dont recall how skips are handled in webhdfs, but i would assume that
> you'll get to the first block As usual, and the skip is handled by the DN
> serving the file (as webhdfs doesnot know at open that you'll skip)
>
> Alejandro
> (phone typing)
>
> On Mar 17, 2014, at 9:47, RJ Nowling <rnowling@gmail.com> wrote:
>
> Hi Alejandro,
>
> The WebHDFS API allows specifying an offset and length for the request.
>  If I specify an offset that start in the second block for a file (thus
> skipping the first block all together), will the namenode still direct me
> to a datanode with the first block or will it direct me to a namenode with
> the second block?  I.e., am I assured data locality only on the first block
> of the file (as you're saying) or on the first block I am accessing?
>
> If it is as you say, then I may want to reach out the WebHDFS developers
> and see if they would be interested in the additional functionality.
>
> Thank you,
> RJ
>
>
> On Mon, Mar 17, 2014 at 2:40 AM, Alejandro Abdelnur <tucu@cloudera.com>wrote:
>
> I may have expressed myself wrong. You don't need to do any test to see
> how locality works with files of multiple blocks. If you are accessing a
> file of more than one block over webhdfs, you only have assured locality
> for the first block of the file.
>
> Thanks.
>
>
> On Sun, Mar 16, 2014 at 9:18 PM, RJ Nowling <rnowling@gmail.com> wrote:
>
> Thank you, Mingjiang and Alejandro.
>
> This is interesting.  Since we will use the data locality information for
> scheduling, we could "hack" this to get the data locality information, at
> least for the first block.  As Alejandro says, we'd have to test what
> happens for other data blocks -- e.g., what if, knowing the block sizes, we
> request the second or third block?
>
> Interesting food for thought!  I see some experiments in my future!
>
> Thanks!
>
>
> On Sun, Mar 16, 2014 at 10:14 PM, Alejandro Abdelnur <tucu@cloudera.com>wrote:
>
> well, this is for the first block of the file, the rest of the file
> (blocks being local or not) are streamed out by the same datanode. for
> small files (one block) you'll get locality, for large files only the first
> block, and by chance if other blocks are local to that datanode.
>
>
> Alejandro
> (phone typing)
>
> On Mar 16, 2014, at 18:53, Mingjiang Shi <mshi@gopivotal.com> wrote:
>
> According to this page:
> http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>
> *Data Locality*: The file read and file write calls are redirected to the
> corresponding datanodes. It uses the full bandwidth of the Hadoop cluster
> for streaming data.
> *A HDFS Built-in Component*: WebHDFS is a first class built-in component
> of HDFS. It runs inside Namenodes and Datanodes, therefore, it can use all
> HDFS functionalities. It is a part of HDFS - there are no additional
> servers to install
>
>
> So it looks like the data locality is built-into webhdfs, client will be
> redirected to the data node automatically.
>
>
>
>
> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowling@gmail.com> wrote:
>
> Hi all,
>
> I'm writing up a Google Summer of Code proposal to add HDFS support to
> Disco, an Erlang MapReduce framework.
>
> We're interested in using WebHDFS.  I have two questions:
>
> 1) Does WebHDFS allow querying data locality information?
>
> 2) If the data locality information is known, can data on specific data
> nodes be accessed via Web HDFS?  Or do all Web HDFS requests have to go
> through a single server?
>
> Thanks,
> RJ
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>
>
>
>
> --
> Cheers
> -MJ
>
>
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>
>
>
>
> --
> Alejandro
>
>
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>
>
>
>


-- 
em rnowling@gmail.com
c 954.496.2314

Mime
View raw message