hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-2885) Restructure the hadoop.dfs package
Date Sat, 08 Mar 2008 01:27:48 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576462#action_12576462
] 

sanjay.radia edited comment on HADOOP-2885 at 3/7/08 5:26 PM:
--------------------------------------------------------------

Here are the 3 proposals on table with their pros and cons

Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as the wrapper.

h2. Proposal 1: No HDFS in core

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
     fs constains kfs, s3 wrappers etc BUT no HDFS classes.
     FileSystem.get(conf) constructs DistributedFileSystem via dynamic class loading.

hdfs
   org.apache.hadoop.fs.hdfs  contains client side and server side
    Will generate 2 jars: hdfs-client.jar and hdfs-server.jar

mapred
	org.apache.hadoop.mapred

h5. Pros:
   Can rev the HDFS client protocol by merely supplying a new jar.
   (note that in practice this is not that useful in a distributed system
     since you have distribute the updated protocol jar to all machines
     running the application).
   The hdfs protocol is not visible in core src tree
   javadoc == ALL the classes in core 

h5. Cons:
  App needs 2 jars: core.jar and hdfs-client.jar
  Structure is not similar to fs.kfs and fs.s3
  Harder to make DistribtuedFileSystem public if we wish since it is not sitting
  in core (I don't think we should make it public anyway)



h2. Proposal 2: Client side HDFS [wrapper and protocol] in core

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
    fs.hdfs contains DistributedFileSystem and DFSClient
    fs constains kfs, s3 wrappers etc

hdfs
   org.apache.hadoop.fs.hdfs contains server side only

mapred
   org.apache.hadoop.mapred



h5. Pros: 
   Apps need only one jar - core
   Structure is a *partially* similar to fs.kfs and fs.s3
   *Partially* and not *fully* similar because DFSClient is in core's fs.hdfs
   The other fs wrappers do not contain their protocols
  Easier to make DistribtuedFileSystem public if we wish since it is sitting
  in core (I don't think we should make it public anyway)
h5. Cons: 
   Reving the HDFS protocol requires updating core
   The hdfs protocol is visible in core src tree
   core's javadoc will need to exclude DFSClient and DistributedFileSystem


h2. Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
    fs.hdfs contains DistributedFileSystem (but NOT DFSClient)
    Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file system
    sits in core's fs.

hdfs
   org.apache.hadoop.fs.hdfs contains server side and DFSClient
   Two jars


mapred
   org.apache.hadoop.mapred

h5. Pros:
   Can rev the HDFS client protocol by merely supplying a new jar
   The hdfs protocol is not visible in core src tree
   Structure is similar to fs.kfs and fs.s3
  Easier to make DistribtuedFileSystem public if we wish since it is sitting
  in core (I don't think we should make it public anyway)
h5. Cons:
  App needs core jar and hdfs-client jar
  Circular dependedncy between core jar and hdfs-client jar
  core's javadoc will need to exclude DistributedFileSystem



      was (Author: sanjay.radia):
    Here are the 3 proposals on table with their pros and cons

Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as the wrapper.

h1. Proposal 1: No HDFS in core

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
     fs constains kfs, s3 wrappers etc BUT no HDFS classes.
     FileSystem.get(conf) constructs DistributedFileSystem via dynamic class loading.

hdfs
   org.apache.hadoop.fs.hdfs  contains client side and server side
    Will generate 2 jars: hdfs-client.jar and hdfs-server.jar

mapred
	org.apache.hadoop.mapred

Pros:
   Can rev the HDFS client protocol by merely supplying a new jar.
   (note that in practice this is not that useful in a distributed system
     since you have distribute the updated protocol jar to all machines
     running the application).
   The hdfs protocol is not visible in core src tree
   javadoc == ALL the classes in core 

Cons:
  App needs 2 jars: core.jar and hdfs-client.jar
  Structure is not similar to fs.kfs and fs.s3
  Harder to make DistribtuedFileSystem public if we wish since it is not sitting
  in core (I don't think we should make it public anyway)



h1. Proposal 2: Client side HDFS [wrapper and protocol] in core

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
    fs.hdfs contains DistributedFileSystem and DFSClient
    fs constains kfs, s3 wrappers etc

hdfs
   org.apache.hadoop.fs.hdfs contains server side only

mapred
   org.apache.hadoop.mapred



Pros: 
   Apps need only one jar - core
   Structure is a *partially* similar to fs.kfs and fs.s3
   *Partially* and not *fully* similar because DFSClient is in core's fs.hdfs
   The other fs wrappers do not contain their protocols
  Easier to make DistribtuedFileSystem public if we wish since it is sitting
  in core (I don't think we should make it public anyway)
Cons: 
   Reving the HDFS protocol requires updating core
   The hdfs protocol is visible in core src tree
   core's javadoc will need to exclude DFSClient and DistributedFileSystem


h1. Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate

core 
    org.apache.hadoop.{io,conf,ipc,util,fs}
    fs.hdfs contains DistributedFileSystem (but NOT DFSClient)
    Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file system
    sits in core's fs.

hdfs
   org.apache.hadoop.fs.hdfs contains server side and DFSClient
   Two jars


mapred
   org.apache.hadoop.mapred

Pros:
   Can rev the HDFS client protocol by merely supplying a new jar
   The hdfs protocol is not visible in core src tree
   Structure is similar to fs.kfs and fs.s3
  Easier to make DistribtuedFileSystem public if we wish since it is sitting
  in core (I don't think we should make it public anyway)
Cons:
  App needs core jar and hdfs-client jar
  Circular dependedncy between core jar and hdfs-client jar
  core's javadoc will need to exclude DistributedFileSystem


  
> Restructure the hadoop.dfs package
> ----------------------------------
>
>                 Key: HADOOP-2885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2885
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: dfs
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: Prototype dfs package.png
>
>
> This Jira proposes restructurign the package hadoop.dfs.
> 1. Move all server side and internal protocols (NN-DD etc) to hadoop.dfs.server.*
> 2. Further breakdown of dfs.server.
> - dfs.server.namenode.*
> - dfs.server.datanode.*
> - dfs.server.balancer.*
> - dfs.server.common.* - stuff shared between the various servers
> - dfs.protocol.*  - internal protocol between DN, NN and Balancer etc.
> 3. Client interface:
> - hadoop.dfs.DistributedFileSystem.java
> - hadoop.dfs.ChecksumDistributedFileSystem.java
> - hadoop.dfs.HftpFilesystem.java
> - hadoop.dfs.protocol.* - the client side protocol

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message