hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: HDFS interfaces
Date Tue, 04 Jun 2013 06:38:20 GMT
Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.

  /**
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  static LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
      String src, long start, long length)
      throws IOException {
    try {
      return namenode.getBlockLocations(src, start, length);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     FileNotFoundException.class,
                                     UnresolvedPathException.class);
    }
  }




On Tue, Jun 4, 2013 at 2:00 AM, Mahmood Naderan <nt_mahmood@yahoo.com>wrote:

> There are many instances of getFileBlockLocations in hadoop/fs. Can you
> explain which one is the main?
> >It must be combined with a method of logically splitting the input data
> along block boundaries, and of launching tasks on worker nodes that >are
> close to the data splits
> Is this a user level task of system level task?
>
>
> Regards,
> Mahmood*
> *
>
>   ------------------------------
>  *From:* John Lilley <john.lilley@redpoint.net>
> *To:* "user@hadoop.apache.org" <user@hadoop.apache.org>; Mahmood Naderan <
> nt_mahmood@yahoo.com>
> *Sent:* Tuesday, June 4, 2013 3:28 AM
> *Subject:* RE: HDFS interfaces
>
>  Mahmood,
>
> It is the in the FileSystem interface.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
> long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations%28org.apache.hadoop.fs.Path,%20long,%20long%29>
>
> This by itself is not sufficient for application programmers to make good
> use of data locality.  It must be combined with a method of logically
> splitting the input data along block boundaries, and of launching tasks on
> worker nodes that are close to the data splits.  MapReduce does both of
> these things internally along with the file-format input classes.  For an
> application to do so directly, see the new YARN-based interfaces
> ApplicationMaster and ResourceManager.  These are however very new and
> there is little documentation or examples.
>
> john
>
>  *From:* Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
> *Sent:* Monday, June 03, 2013 12:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS interfaces
>
>  Hello,
>  It is stated in the "HDFS architecture guide" (
> https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that
>
>  *HDFS provides interfaces for applications to move themselves closer to
> where the data is located. *
>
>  What are these interfaces and where they are in the source code? Is
> there any manual for the interfaces?
>
>   Regards,
> Mahmood
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Mime
View raw message