hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ling Kun <lkun.e...@gmail.com>
Subject Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS
Date Fri, 19 Apr 2013 09:38:32 GMT
Dear  Daryn Sharp,
   Your reply helps me a lot for  code reading of the HDFS and FileSystem


Ling Kun

On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp <daryn@yahoo-inc.com> wrote:

> On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
> > Dear all,
> >    I am a little confusing about the URI, Home Directory and Working
> Directory in the FileSystem.java or HDFS.
> >
> >   I have listed my understanding about these concept, can someone please
> figure out whether I am correct?  Thanks.
> >
> >    The Home directory: This is usually a directory for a specific Hadoop
> users. And for the path, it is a user specific path. In HDFS, it is like
>  HDFS://NameNode:port/user/USERNAME.
> Correct.
> >    The URI: Is this the root of the distributed filesystem. for HDFS, it
> is just the HDFS://NameNode:port/ , each file/directory in the distributed
> filesystem is just a file or subdirectory in this path.
> Generally correct.  However, I'd strongly suggest avoiding the use of URIs
> directly.  It's better to obtain your filesystems via
> path.getFileSystem(conf) - it will extract the URI for the filesystem
> automatically.  See below for the correct definition of a Path.
> >    The working directory: I am a little confused about this variable. At
> a given time, there exists only one instance of the filesystem class, and
> the working dir is a private state of the FS. And during the job running,
> hadoop will switch among several dirs, and the working dir will be modified
> once it is switched. Like in the shared system dir, home dir, or
> input/output dir.
> Correct.
> >    Although I have looked through the related document, I am still a
> little confused about the java.net.URI,  java.io.File and
> org.apache.hadoop.fs.Path class. It seems URI could be
> hdfs://XXX/XXX/FILENAME, while Path only can be the path without the
> scheme, hostname and the port.  For the File class, it is just an object
> for a specific file.
> Your understanding of Path is incorrect.  Path is really just a veneer
> over a URI.  A Path can be qualified with a scheme/authority, or just be
> absolute or relative.  If a Path is not scheme qualified, it uses the
> defaultFS.  If the Path is not absolute, it's qualified against the working
> directory.  Path provides some niceties like not requiring percent encoding
> in the path portion of the URI, and allows use of glob chars and the
> quoting thereof.
> I hope this helps!
> Daryn


View raw message