hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Macdonald (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4) tool to mount dfs on linux
Date Thu, 21 Feb 2008 10:45:19 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571004#action_12571004

Craig Macdonald commented on HADOOP-4:

Hi Pete,

Definently using the latest tar this time ;-)
My first time using the new build system - looks good! 

Some comments:

1. Firstly, I shouldnt have deleted my last comment - though it was clearly in error as I
was reading the wrong version of fuse_dfs.c. In your comments, can you say which file you've
just uploaded?

For posterity, previous comment was:


I will try the newer version tomorrow when @work. I note that fi->fh isnt used or set in
dfs_read in your latest version. Could we set it in dfs_open for O_READONLY, and then use
it if available? 

I'm not clear on the semantics of hdfsPread  - does it assume that offset is after previous
If so then we need to check that the current read on a file is strictly after the previous
read for a previously open FH to be of use - hdfsTell could be of use here.

2. With respect to the read speed, this is indeed a bit faster in our test setting (nearer
6MB/sec), but not yet similar to the Hadoop fs shell (about 10.5MB/sec). Fuse version 2.7.2
# time bin/hadoop fs -cat /user/craigm/data.df > /dev/null 

real    0m50.347s
user    0m16.023s
sys     0m6.644s

# time cat /misc/hdfs/user/craigm/data.df > /dev/null 

real    1m31.263s
user    0m0.131s
sys     0m2.384s

I'm trying to measure the CPU taken by fuse_dfs for the same read, so we know how much CPU
time it burns.

Can I ask how your test time test compares to using the Hadoop fs shell on the same machine?
When reading, the CPU on the client is used 45%ish, similar to the Hadoop fs shell CPU use.

I feel it would be good to aim for similar performance as the Hadoop fs shell, as this seems
reasonable compared to NFS in my test setting, and should scale better as the number of concurrent
reads increases, given available wire bandwidth.

3. With respect to the build system, it could be clearer what --with-dfspath= is meant to
point to. src/Makefile.am seems to assume that include files are at ${dfspath}/include/linux
and the hdfs.so at ${dfspath}/include/shared. This isnt how the Hadoop installation is laid
out. Perhaps it would be better if we could give an option to the hadoop installation and
it's taken from there?

4. src/Makefile.am assumes an amd64 architecture. Same problem I noted in my shell script
about guessing the locations of the JRE shared lib files.

5 (minor). the last tar.gz had a link to aclocal.m4 in the external folder that was absolute
- ie to your installation. Should be deleted when building tar file.

6 (minor). update print_usage if you're happy with the specification of filesystem options.
I made no changes to my shell script or my autofs mount for this version to work :-)



> tool to mount dfs on linux
> --------------------------
>                 Key: HADOOP-4
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.5.0
>         Environment: linux only
>            Reporter: John Xing
>            Assignee: Doug Cutting
>         Attachments: fuse-dfs.tar.gz, fuse-dfs.tar.gz, fuse-dfs.tar.gz, fuse-dfs.tar.gz,
fuse-dfs.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz,
fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-03.tar.gz, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c, fuse_dfs.c,
fuse_dfs.sh, Makefile
> tool to mount dfs on linux

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message