hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Macdonald (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-834) Export the HDFS file system through a NFS protocol
Date Tue, 11 Dec 2007 09:49:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550377

Craig Macdonald commented on HADOOP-834:

Hello Dhruba,

It's is best I explain what I did to get where I did. Originally when I read this issue, I
did not note the attachment, hence the duplicated effort to implement a NFS-DFS gateway implementation

Firstly, like you, I looked around for an existing Java NFS implementation that I liked. I
found another, jnfsd (http://members.aol.com/_ht_a/markmitche11/jnfsd.htm). This is also written
in Java, but again restricted to NFSv2.

However, what is interesting about this NFS implementation is that it is based on a .x file.
These .x files describe the C API and the RPC packets in a .c like syntax, and .c files can
be built using a Unix tool called rpcgen. RemoteTea (remotetea.sourceforge.net) can also compile
.x files into .java files. Along with a jar file it provides, these java files implement objects
representing all network connectivity, so that all that much be added is implementations of
the appropriate methods (ie READDIR, READ, WRITE, CREATE, etc) .

jnfsd compiles nfs_prot.x (v2). So instead, I downloaded nf3_prot.x and ran that on jprcgen.
Since then, I've been slowly adding implmentations of calls, but havent  had a chance to test
it yet.

I have the following points to make to compare the two:

For remotetea based solution
 * Follows directly a .C description of the RPC protocol, so could up to NFS v4 in the future
 * Remotetea handles the network API etc

Against the current remotetea solution
 * Remotetea creates lots of objects when performing RPC calls, perhaps too many?
 * Stuck within the Remotetea framework
 * (I havent finished it)
 * Memory based caching of handles - can be expensive, eg for du operations [du has no RPC
call, so requires recursive READDIR ops]

For Dhruba's solution
 * Easier to customise?
 * Disk based caching of NFS handles

Against Dhruba's solution
 * Harder to port to other NFS versions?


NFS writing semantics

I picked this up from the  NFS RFC (http://www.faqs.org/rfcs/rfc1813.html)

   The NFS version 3 protocol introduces safe asynchronous writes
   on the server, when the WRITE procedure is used in conjunction
   with the COMMIT procedure. The COMMIT procedure provides a way
   for the client to flush data from previous asynchronous WRITE
   requests on the server to stable storage and to detect whether
   it is necessary to retransmit the data.

With the introduction of your described extended writing API for dfs, it seems we are likely
making progress towards a suitable writing solution. I would indeed suggest a replacement
of blocks for random-like writes. My feeling is that we likely need to experiment with various
NFS clients, particularly linux, to determine how they write files for typical operations.
People know that the DFS is designed for storing large files in streams, so it's probably
acceptable if a random write essentially requries a 64MB copy, update and replicate (ie slow).


> Export the HDFS file system through a NFS protocol
> --------------------------------------------------
>                 Key: HADOOP-834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-834
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: nfshadoop.tar.gz
> It would be nice if can expose the HDFS filesystem using the NFS protocol.
> There are a couple of options that I could find:
> 1. Use a user space C-language-implementation of a NFS server and then use the libhdfs
API to integrate that code with Hadoop. There is such an implementation available at http://sourceforge.net/project/showfiles.php?group_id=66203.
> 2.  Use a user space Java implementation of a NFS server and then integrate it with HDFS
using Java API. There is such an implementation of NFS server at http://void.org/~steven/jnfs/.
> I have experimented with Option 2 and have written a first version of the Hadoop integration.
I am attaching the code for your preliminary feedback. This implementation of the Java NFS
server has one limitation: it supports UDP only. Some licensing issues will have to be sorted
out before it can be used.  Steve (the writer of the NFS server implemenation) has told me
that he can change the licensing of the code if needed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message