hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: Data Locality and WebHDFS
Date Mon, 17 Mar 2014 02:14:43 GMT
well, this is for the first block of the file, the rest of the file (blocks being local or
not) are streamed out by the same datanode. for small files (one block) you'll get locality,
for large files only the first block, and by chance if other blocks are local to that datanode.



Alejandro
(phone typing)

> On Mar 16, 2014, at 18:53, Mingjiang Shi <mshi@gopivotal.com> wrote:
> 
> According to this page: http://hortonworks.com/blog/webhdfs-%E2%80%93-http-rest-access-to-hdfs/
>> Data Locality: The file read and file write calls are redirected to the corresponding
datanodes. It uses the full bandwidth of the Hadoop cluster for streaming data.
>> 
>> A HDFS Built-in Component: WebHDFS is a first class built-in component of HDFS. It
runs inside Namenodes and Datanodes, therefore, it can use all HDFS functionalities. It is
a part of HDFS – there are no additional servers to install
>> 
> 
> So it looks like the data locality is built-into webhdfs, client will be redirected to
the data node automatically. 
> 
> 
> 
> 
>> On Mon, Mar 17, 2014 at 6:07 AM, RJ Nowling <rnowling@gmail.com> wrote:
>> Hi all,
>> 
>> I'm writing up a Google Summer of Code proposal to add HDFS support to Disco, an
Erlang MapReduce framework.  
>> 
>> We're interested in using WebHDFS.  I have two questions:
>> 
>> 1) Does WebHDFS allow querying data locality information?
>> 
>> 2) If the data locality information is known, can data on specific data nodes be
accessed via Web HDFS?  Or do all Web HDFS requests have to go through a single server?
>> 
>> Thanks,
>> RJ
>> 
>> -- 
>> em rnowling@gmail.com
>> c 954.496.2314
> 
> 
> 
> -- 
> Cheers
> -MJ

Mime
View raw message