hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "MountableHDFS" by petewyckoff
Date Thu, 04 Sep 2008 00:27:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by petewyckoff:
http://wiki.apache.org/hadoop/MountableHDFS

The comment on the change is:
improve  fuse-dfs instructions and add supported operating systems and fuse 

------------------------------------------------------------------------------
  = Mounting HDFS =
  
  {{{
- [mymachine] ~ > df -kh /export/hdfs/
+ [machine1] ~ > df -kh /export/hdfs/
  Filesystem            Size  Used Avail Use% Mounted on
  fuse                  4.1P  642T  3.5P  21% /export/hdfs
- [mymachine] ~ > ls /export/hdfs/
+ [machine1] ~ > ls /export/hdfs/
  home  tmp  Trash  user  usr  var
  }}}
  
  
  
- These projects allow HDFS to be mounted (on most flavors of Unix) as a standard file system
using the mount command.  Once mounted, the user can operate on an instance of hdfs using
standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard
Posix libraries like open, write, read, close from C, C++, Python, Ruby, Perl, Java, bash,
etc. 
+ These projects (enumerated below) allow HDFS to be mounted (on most flavors of Unix) as
a standard file system using the mount command.  Once mounted, the user can operate on an
instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find',
'grep', or use standard Posix libraries like open, write, read, close from C, C++, Python,
Ruby, Perl, Java, bash, etc. 
  
- == Options ==
+ They are all based on the Filesystem in Userspace project FUSE ([http://fuse.sourceforge.net/]).
Although the Webdav-based one can be used with other webdav tools, but requires FUSE to actually
mount.
+ 
+ Note that a great thing about FUSE is you can export a fuse mount using NFS, so you can
use fuse-dfs to mount hdfs on one machine and then export that using NFS. The bad news is
that fuse relies on the kernel's inode cache since fuse is path-based and not inode-based.
If an inode is flushed from the kernel cache on the server, NFS clients get hosed; they try
doing a read or an open with an inode the server doesn't have a mapping for and thus NFS chokes.
So, while the NFS route gets you started quickly, for production it is more robust to automount
fuse on all the machines you want access to hdfs from.
+ 
+ == Projects ==
  
   * contrib/fuse-dfs is built on fuse, some C glue, libhdfs and the hadoop-dev.jar
   * fuse-j-hdfs is built on fuse, fuse for java, and the hadoop-dev.jar
   * hdfs-fuse - a google code project is very similar to contrib/fuse-dfs
   * webdav - hdfs exposed as a webdav resource
  
+ == Supported Operating Systems ==
+ 
+ Linux 2.4, 2.6, FreeBSD, NetBSD, MacOS X, Windows, Open Solaris. See: [http://fuse.sourceforge.net/wiki/index.php/OperatingSystems]
+ 
+ 
  == Fuse-DFS ==
  
- Supports reads, writes, and directory operations (e.g., cp, ls, more, cat, find, less, rm,
mkdir, mv, rmdir).  Does not support touch, chmod, chown, and it does not respect permissions
and shows all files as owned by 'nobody'.
+ Supports reads, writes, and directory operations (e.g., cp, ls, more, cat, find, less, rm,
mkdir, mv, rmdir).  Things like touch, chmod, chown, and permissions are in the works. Fuse-dfs
currently shows all files as owned by nobody.
+ 
+ == Contributing ==
+ 
+ It's pretty straightforward to add functionality to fuse-dfs as fuse makes things relatively
simple. Some other tasks require also augmenting libhdfs to expose more hdfs functionality
to C. See [http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310240&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12312376
 contrib/fuse-dfs JIRAs]
+ 
+ == Requirements ==
+ 
+  * Hadoop with compiled libhdfs.so
+  * Linux kernel > 2.6.9 with fuse, which is the default or Fuse 2.7.x, 2.8.x installed.
See: [http://fuse.sourceforge.net/]
+  * modprobe fuse to load it
+  * fuse-dfs executable (see below)
+  * fuse_dfs_wrapper.sh installed in /bin or other appropriate location (see below)
  
  
  === BUILDING ===
  
+    1. in HADOOP_HOME: `ant compile-libhdfs -Dlibhdfs=1
+    2. in HADOOP_HOME: `ant package` to deploy libhdfs
- Requirements:
- 
-    1. a Linux kernel > 2.6.9 or a kernel module from FUSE - i.e., you compile it yourself
and then modprobe it. Better off with the former option if possible.  (Note for now if you
use the kernel with fuse included, it doesn't allow you to export this through NFS so be warned.
See the FUSE email list for more about this.)
- 
-    2. FUSE should be installed in /usr/local or FUSE_HOME ant environment variable
- 
- To build:
- 
-    1. in HADOOP_HOME: `ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1`
+    3. in HADOOP_HOME: `ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1`
  
  NOTE: for amd64 architecture, libhdfs will not compile unless you edit
  the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64
  (probably the same for others too). See [https://issues.apache.org/jira/browse/HADOOP-3344
HADOOP-3344]
  
+ Common build problems include not finding the libjvm.so in JAVA_HOME/jre/lib/OS_ARCH/server
or not finding fuse in FUSE_HOME or /usr/local.
+ 
  
  === CONFIGURING ===
  
- Look at all the paths in fuse_dfs_wrapper.sh and either correct them or set them in your
environment before running. (note for automount and mount as root, you probably cannnot control
the environment, so best to set them in the wrapper)
+ Look at all the paths in fuse_dfs_wrapper.sh and either correct them or set them in your
environment before running. (note for automount and mount as root, you probably cannot control
the environment, so best to set them in the wrapper)
  
  === INSTALLING ===
  
- 1. `mkdir /mnt/dfs` (or wherever you want to mount it)
+ 1. `mkdir /export/hdfs` (or wherever you want to mount it)
  
- 2. `fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /mnt/dfs -d` and from another
terminal, try `ls /mnt/dfs`
+ 2. `fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /export/hdfs -d` and from another
terminal, try `ls /export/hdfs`
  
  If 2 works, try again dropping the debug mode, i.e., -d
  
  (note - common problems are that you don't have libhdfs.so or libjvm.so or libfuse.so on
your LD_LIBRARY_PATH, and your CLASSPATH does not contain hadoop and other required jars.)
+ 
+ Also note, fuse-dfs will write error/warn messages to the syslog - typically in /var/log/messages
+ 
+ You can use fuse-dfs to mount multiple hdfs instances by just changing the server/port name
and directory mount point above.
  
  === DEPLOYING ===
  
@@ -63, +84 @@

  
  1. add the following to /etc/fstab
  {{{
- fuse_dfs#dfs://hadoop_server.foo.com:9000 /mnt/dfs fuse -oallow_other,rw,-ousetrash 0 0
+ fuse_dfs#dfs://hadoop_server.foo.com:9000 /export/hdfs fuse -oallow_other,rw,-ousetrash
0 0
  }}}
  
- 2. Mount using: `mount /mnt/dfs`. Expect problems with not finding fuse_dfs. You will need
to probably add this to /sbin and then problems finding the above 3 libraries. Add these using
ldconfig.
+ 2. Mount using: `mount /export/hdfs`. Expect problems with not finding fuse_dfs. You will
need to probably add this to /sbin and then problems finding the above 3 libraries. Add these
using ldconfig.
  
  
  Fuse DFS takes the following mount options (i.e., on the command line or the comma separated
list of options in /etc/fstab:
@@ -98, +119 @@

  
  Add the following to /etc/exports:
  {{{
- /mnt/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)
+ /export/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)
  }}}
  NOTE - you cannot export this with a FUSE module built into the kernel
  - e.g., kernel 2.6.17. For info on this, refer to the FUSE wiki.
  
- === ADVANCED ===
- 
- you may want to ensure certain directories cannot be deleted from the
- shell until the FS has permissions. You can set this in the build.xml
- file in src/contrib/fuse-dfs/build.xml
  
  === RECOMMENDATIONS ===
  
@@ -127, +143 @@

  3. Reads are ~20-30% slower even with the read buffering. 
  
  4. fuse-dfs and underlying libhdfs have no support for permissions. See [https://issues.apache.org/jira/browse/HADOOP-3536
HADOOP-3536] 
+ 
  
  == Fuse-j-HDFS ==
  

Mime
View raw message