hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: HDFS interfaces
Date Tue, 04 Jun 2013 18:35:52 GMT
When you use the HDFS client interface to read a file, it automatically figures out which datanodes
to contact for reading which blocks.  There isn't really a "main" block.  However I have read
that the first location listed for each block is the "recommended" one to read for an outside
client.  Normally, an outside client doesn't need to know this information at all as the HDFS
file interface takes care of it.  An "inside" application such as MapReduce *does* need to
know this information so that it can run tasks on nodes that are "close" to the data split
being processed.  If you are writing a custom ApplicationMaster using YARN, you will also
want to know this.


From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Tuesday, June 04, 2013 12:01 AM
To: user@hadoop.apache.org
Subject: Re: HDFS interfaces

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one
is the main?
>It must be combined with a method of logically splitting the input data along block boundaries,
and of launching tasks on worker nodes that >are close to the data splits
Is this a user level task of system level task?


From: John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>;
Mahmood Naderan <nt_mahmood@yahoo.com<mailto:nt_mahmood@yahoo.com>>
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces


It is the in the FileSystem interface.
long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.
 It must be combined with a method of logically splitting the input data along block boundaries,
and of launching tasks on worker nodes that are close to the data splits.  MapReduce does
both of these things internally along with the file-format input classes.  For an application
to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.
 These are however very new and there is little documentation or examples.


From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: HDFS interfaces

It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html)

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the


View raw message