hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-2885) Restructure the hadoop.dfs package
Date Mon, 10 Mar 2008 19:18:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577032#action_12577032
] 

sanjay.radia edited comment on HADOOP-2885 at 3/10/08 12:16 PM:
----------------------------------------------------------------



As far as hadoop goes, the interface is fs.FileSystem.
What is the interface of hdfs which implements fs.FileSystem?
* fs.hdfs.DistributedFileSystem
* fs.hdfs.theProtocol

Even though we may consider the above two interfaces to be private, it is worth discussing
which of the two interfaces is hdfs's interface. (See my note below about whether 
these two interfaces are considered publi8c or private).


*Analogy*
For NFS, the wire protocol is the interface.
 Proposal 2 would be the most suitable if we consider the HDFS protocol to be the interface.
Proposal 1 would also
be okay as long hdfs supplies 2 jars, Proposal 1 has the advantage that there can be other
impls of the client side 
wrappers that talk the hdfs protocol. (for example other wrappers could do client side caching
while
keeping the protocol same). 

For Posix, libc is the interface. The system calls are like the protocol that libc uses to
talk to the kernel.
Each new version of posix would ship new impls of libc and the system calls. Apps link dynamically
with libc.
In a distributed system, distributing a new wrapper to all clients  is hard to do since the

clients are distributed and do not link dyanamically with the wrapper.
Jini for example provides a way for the clients to pull the new wrapper by means
of dynamic class loading across the wire (this were heated discussion over this in the java
commnunity).
We have no plans dynamically load classes  across the wire. But none the less, the OS view
of its
interface is a useful analogy. Proposal 1 would be most suitable for this view.


BTW should DistributedFileSystem,  DFSClient  and the protocol be public or private interfaces?
So far I don't see any reason to make any of these public (although we should make 
sure that the protocol remains compatible over time).

      was (Author: sanjay.radia):
    


As far as hadoop goes, the interface is fs.FileSystem.
What is the interface of hdfs which implements fs.FileSystem?
* fs.hdfs.DistributedFileSystem
* fs.hdfs.theProtocol

Even though we may consider the above two interfaces to be private, it is worth discussing
which of the
two interfaces is hdfs's interface. 

*Analogy*
For NFS, the wire protocol is the interface.
 Proposal 2 would be the most suitable if we consider the HDFS protocol to be the interface.
Proposal 1 would also
be okay as long hdfs supplies 2 jars, Proposal 1 has the advantage that there can be other
impls of the client side 
wrappers that talk the hdfs protocol. (for example other wrappers could do client side caching
while
keeping the protocol same). 

For Posix, libc is the interface. The system calls are like the protocol that libc uses to
talk to the kernel.
Each new version of posix would ship new impls of libc and the system calls. Apps link dynamically
with libc.
In a distributed system, distributing a new wrapper to all clients  is hard to do since the

clients are distributed and do not link dyanamically with the wrapper.
Jini for example provides a way for the clients to pull the new wrapper by means
of dynamic class loading across the wire (this were heated discussion over this in the java
commnunity).
We have no plans dynamically load classes  across the wire. But none the less, the OS view
of its
interface is a useful analogy. Proposal 1 would be most suitable for this view.
  
> Restructure the hadoop.dfs package
> ----------------------------------
>
>                 Key: HADOOP-2885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2885
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: Prototype dfs package.png
>
>
> This Jira proposes restructurign the package hadoop.dfs.
> 1. Move all server side and internal protocols (NN-DD etc) to hadoop.dfs.server.*
> 2. Further breakdown of dfs.server.
> - dfs.server.namenode.*
> - dfs.server.datanode.*
> - dfs.server.balancer.*
> - dfs.server.common.* - stuff shared between the various servers
> - dfs.protocol.*  - internal protocol between DN, NN and Balancer etc.
> 3. Client interface:
> - hadoop.dfs.DistributedFileSystem.java
> - hadoop.dfs.ChecksumDistributedFileSystem.java
> - hadoop.dfs.HftpFilesystem.java
> - hadoop.dfs.protocol.* - the client side protocol

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message