hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "MountableHDFS" by CraigMacdonald
Date Mon, 18 Aug 2008 21:34:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by CraigMacdonald:
http://wiki.apache.org/hadoop/MountableHDFS

The comment on the change is:
minor formatting changes; link to permissions issues

------------------------------------------------------------------------------
  
  Requirements:
  
+    1. a Linux kernel > 2.6.9 or a kernel module from FUSE - i.e., you compile it yourself
and then modprobe it. Better off with the former option if possible.  (Note for now if you
use the kernel with fuse included, it doesn't allow you to export this through NFS so be warned.
See the FUSE email list for more about this.)
-    1. a Linux kernel > 2.6.9 or a kernel module from FUSE - i.e., you
-    compile it yourself and then modprobe it. Better off with the
-    former option if possible.  (Note for now if you use the kernel
-    with fuse included, it doesn't allow you to export this through NFS
-    so be warned. See the FUSE email list for more about this.)
  
-    2. FUSE should be installed in /usr/local or FUSE_HOME ant
+    2. FUSE should be installed in /usr/local or FUSE_HOME ant environment variable
-    environment variable
  
  To build:
  
-    1. in HADOOP_HOME: ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1
+    1. in HADOOP_HOME: `ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1`
- 
  
  NOTE: for amd64 architecture, libhdfs will not compile unless you edit
  the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64
- (probably the same for others too).
+ (probably the same for others too). See [https://issues.apache.org/jira/browse/HADOOP-3344
HADOOP-3344]
  
  
  === CONFIGURING ===
  
+ Look at all the paths in fuse_dfs_wrapper.sh and either correct them or set them in your
environment before running. (note for automount and mount as root, you probably cannnot control
the environment, so best to set them in the wrapper)
- Look at all the paths in fuse_dfs_wrapper.sh and either correct them
- or set them in your environment before running. (note for automount
- and mount as root, you probably cannnot control the environment, so
- best to set them in the wrapper)
  
  === INSTALLING ===
  
- 1. mkdir /mnt/dfs (or wherever you want to mount it)
+ 1. `mkdir /mnt/dfs` (or wherever you want to mount it)
  
+ 2. `fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /mnt/dfs -d` and from another
terminal, try `ls /mnt/dfs`
- 2. fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /mnt/dfs -d
- ; and from another terminal, try ls /mnt/dfs
  
  If 2 works, try again dropping the debug mode, i.e., -d
  
+ (note - common problems are that you don't have libhdfs.so or libjvm.so or libfuse.so on
your LD_LIBRARY_PATH, and your CLASSPATH does not contain hadoop and other required jars.)
- (note - common problems are that you don't have libhdfs.so or
- libjvm.so or libfuse.so on your LD_LIBRARY_PATH, and your CLASSPATH
- does not contain hadoop and other required jars.)
  
  === DEPLOYING ===
  
  in a root shell do the following:
  
- 1. add the following to /etc/fstab -
+ 1. add the following to /etc/fstab
+ {{{
-   fuse_dfs#dfs://hadoop_server.foo.com:9000 /mnt/dfs fuse
+ fuse_dfs#dfs://hadoop_server.foo.com:9000 /mnt/dfs fuse -oallow_other,rw,-ousetrash 0 0
-   -oallow_other,rw,-ousetrash 0 0
+ }}}
  
+ 2. Mount using: `mount /mnt/dfs`. Expect problems with not finding fuse_dfs. You will need
to probably add this to /sbin and then problems finding the above 3 libraries. Add these using
ldconfig.
- 2. mount /mnt/dfs Expect problems with not finding fuse_dfs. You will
-    need to probably add this to /sbin and then problems finding the
-    above 3 libraries. Add these using ldconfig.
  
  
  Fuse DFS takes the following mount options (i.e., on the command line or the comma separated
list of options in /etc/fstab:
- 
+ {{{
  -oserver=%s  (optional place to specify the server but in fstab use the format above)
  -oport=%d (optional port see comment on server option)
  -oentry_timeout=%d (how long directory entries are cached by fuse in seconds - see fuse
docs)
@@ -85, +72 @@

  -onotrash (opposite of usetrash)
  -odebug (do not daemonize - aka -d in fuse speak)
  -obig_writes (use fuse big_writes option so as to allow better performance of writes on
kernels >= 2.6.26)
- 
+ }}}
  The defaults are:
- 
+ {{{
  entry,attribute_timeouts = 60 seconds
  rdbuffer = 10 MB
  protected = null
  debug = 0
  notrash
  private = 0
- 
+ }}}
  === EXPORTING ===
  
  Add the following to /etc/exports:
- 
+ {{{
-   /mnt/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)
+ /mnt/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)
- 
+ }}}
  NOTE - you cannot export this with a FUSE module built into the kernel
  - e.g., kernel 2.6.17. For info on this, refer to the FUSE wiki.
  
@@ -112, +99 @@

  
  === RECOMMENDATIONS ===
  
- 1. From /bin, ln -s $HADOOP_HOME/contrib/fuse-dfs/fuse_dfs* .
+ 1. From /bin, `ln -s $HADOOP_HOME/contrib/fuse-dfs/fuse_dfs* .`
+ 
  2. Always start with debug on so you can see if you are missing a classpath or something
like that.
+ 
  3. use -obig_writes
  
- === PERFORMANCE ===
+ === KNOWN ISSUES ===
  
+ 1. if you alias `ls` to `ls --color=auto` and try listing a directory with lots (over thousands)
of files, expect it to be slow and at 10s of thousands, expect it to be very very slow.  This
is because `--color=auto` causes ls to stat every file in the directory. Since fuse-dfs does
not cache attribute entries when doing a readdir, 
- 1. if you alias ls to ls --color=auto and try listing a directory with lots (over thousands)
of files, expect it to be slow and at 10s of thousands, expect it to be
-  very very slow.  This is because --color=auto causes ls to stat every file in the directory.
Since fuse-dfs does not cache attribute entries when doing a readdir, 
- this is very slow. see https://issues.apache.org/jira/browse/HADOOP-3797 
+ this is very slow. see [https://issues.apache.org/jira/browse/HADOOP-3797 HADOOP-3797]
  
+ 2. Writes are approximately 33% slower than the DFSClient. TBD how to optimize this. see:
[https://issues.apache.org/jira/browse/HADOOP-3805 HADOOP-3805] - try using -obig_writes if
on a >2.6.26 kernel, should perform much better since bigger writes implies less context
switching.
- 2. Writes are approximately 33% slower than the DFSClient. TBD how to optimize this. see:
https://issues.apache.org/jira/browse/HADOOP-3805 - try using -obig_writes
-  and if on a >2.6.26 kernel, should perform much better since bigger writes implies less
context switching.
  
  3. Reads are ~20-30% slower even with the read buffering. 
  
- 
+ 4. fuse-dfs and underlying libhdfs have no support for permissions. See [https://issues.apache.org/jira/browse/HADOOP-3536
HADOOP-3536] 
  
  == Fuse-j-HDFS ==
  
- see https://issues.apache.org/jira/browse/HADOOP-4
+ see [https://issues.apache.org/jira/browse/HADOOP-4 HADOOP-4]
  
  == HDFS-FUSE ==
  

Mime
View raw message