Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 26083 invoked from network); 8 Mar 2008 01:37:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Mar 2008 01:37:10 -0000 Received: (qmail 74895 invoked by uid 500); 8 Mar 2008 01:37:04 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 74852 invoked by uid 500); 8 Mar 2008 01:37:04 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 74829 invoked by uid 99); 8 Mar 2008 01:37:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2008 17:37:04 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Mar 2008 01:36:24 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6A4E0234C091 for ; Fri, 7 Mar 2008 17:35:46 -0800 (PST) Message-ID: <16717863.1204940146432.JavaMail.jira@brutus> Date: Fri, 7 Mar 2008 17:35:46 -0800 (PST) From: "Sanjay Radia (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Issue Comment Edited: (HADOOP-2885) Restructure the hadoop.dfs package In-Reply-To: <18157864.1203719359298.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576462#action_12576462 ] sanjay.radia edited comment on HADOOP-2885 at 3/7/08 5:33 PM: -------------------------------------------------------------- Here are the 3 proposals on table with their pros and cons Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as the wrapper. h2. Proposal 1: No HDFS in core core org.apache.hadoop.{io,conf,ipc,util,fs} fs constains kfs, s3 wrappers etc BUT no HDFS classes. FileSystem.get(conf) constructs DistributedFileSystem via dynamic class loading. hdfs org.apache.hadoop.fs.hdfs contains client side and server side Will generate 2 jars: hdfs-client.jar and hdfs-server.jar mapred org.apache.hadoop.mapred h4. Pros: Can rev the HDFS client protocol by merely supplying a new jar. (note that in practice this is not that useful in a distributed system since you have distribute the updated protocol jar to all machines running the application). The hdfs protocol is not visible in core src tree javadoc == ALL the classes in core h4. Cons: App needs 2 jars: core.jar and hdfs-client.jar Structure is not similar to fs.kfs and fs.s3 Harder to make DistribtuedFileSystem public if we wish since it is not sitting in core (I don't think we should make it public anyway) h2. Proposal 2: Client side HDFS [wrapper and protocol] in core core org.apache.hadoop.{io,conf,ipc,util,fs} fs.hdfs contains DistributedFileSystem and DFSClient fs constains kfs, s3 wrappers etc hdfs org.apache.hadoop.fs.hdfs contains server side only mapred org.apache.hadoop.mapred h4. Pros: Apps need only one jar - core Structure is a *partially* similar to fs.kfs and fs.s3 *Partially* and not *fully* similar because DFSClient is in core's fs.hdfs The other fs wrappers do not contain their protocols Easier to make DistribtuedFileSystem public if we wish since it is sitting in core (I don't think we should make it public anyway) h4. Cons: Reving the HDFS protocol requires updating core The hdfs protocol is visible in core src tree core's javadoc will need to exclude DFSClient and DistributedFileSystem h2. Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate core org.apache.hadoop.{io,conf,ipc,util,fs} fs.hdfs contains DistributedFileSystem (but NOT DFSClient) Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file system sits in core's fs. hdfs org.apache.hadoop.fs.hdfs contains server side and DFSClient Two jars mapred org.apache.hadoop.mapred h4. Pros: Can rev the HDFS client protocol by merely supplying a new jar The hdfs protocol is not visible in core src tree Structure is similar to fs.kfs and fs.s3 Easier to make DistribtuedFileSystem public if we wish since it is sitting in core (I don't think we should make it public anyway) h4. Cons: App needs core jar and hdfs-client jar Circular dependedncy between core jar and hdfs-client jar core's javadoc will need to exclude DistributedFileSystem was (Author: sanjay.radia): Here are the 3 proposals on table with their pros and cons Terminology: I am calling impls of FileSystem (e.g. DistributedFileSystem) as the wrapper. h2. Proposal 1: No HDFS in core core org.apache.hadoop.{io,conf,ipc,util,fs} fs constains kfs, s3 wrappers etc BUT no HDFS classes. FileSystem.get(conf) constructs DistributedFileSystem via dynamic class loading. hdfs org.apache.hadoop.fs.hdfs contains client side and server side Will generate 2 jars: hdfs-client.jar and hdfs-server.jar mapred org.apache.hadoop.mapred h4. Pros: Can rev the HDFS client protocol by merely supplying a new jar. (note that in practice this is not that useful in a distributed system since you have distribute the updated protocol jar to all machines running the application). The hdfs protocol is not visible in core src tree javadoc == ALL the classes in core h4. Cons: App needs 2 jars: core.jar and hdfs-client.jar Structure is not similar to fs.kfs and fs.s3 Harder to make DistribtuedFileSystem public if we wish since it is not sitting in core (I don't think we should make it public anyway) h2. Proposal 2: Client side HDFS [wrapper and protocol] in core core org.apache.hadoop.{io,conf,ipc,util,fs} fs.hdfs contains DistributedFileSystem and DFSClient fs constains kfs, s3 wrappers etc hdfs org.apache.hadoop.fs.hdfs contains server side only mapred org.apache.hadoop.mapred h4. Pros: Apps need only one jar - core Structure is a *partially* similar to fs.kfs and fs.s3 *Partially* and not *fully* similar because DFSClient is in core's fs.hdfs The other fs wrappers do not contain their protocols Easier to make DistribtuedFileSystem public if we wish since it is sitting in core (I don't think we should make it public anyway) h4. Cons: Reving the HDFS protocol requires updating core The hdfs protocol is visible in core src tree core's javadoc will need to exclude DFSClient and DistributedFileSystem h2. Proposal 3: HDFS Client Wrapper in core, HDFS protocol is separate core org.apache.hadoop.{io,conf,ipc,util,fs} fs.hdfs contains DistributedFileSystem (but NOT DFSClient) Structure is similar to fs.kfs and fs.s3 in that a wrapper for each file system sits in core's fs. hdfs org.apache.hadoop.fs.hdfs contains server side and DFSClient Two jars mapred org.apache.hadoop.mapred h4. Pros: Can rev the HDFS client protocol by merely supplying a new jar The hdfs protocol is not visible in core src tree Structure is similar to fs.kfs and fs.s3 Easier to make DistribtuedFileSystem public if we wish since it is sitting in core (I don't think we should make it public anyway) h4. Cons: App needs core jar and hdfs-client jar Circular dependedncy between core jar and hdfs-client jar core's javadoc will need to exclude DistributedFileSystem > Restructure the hadoop.dfs package > ---------------------------------- > > Key: HADOOP-2885 > URL: https://issues.apache.org/jira/browse/HADOOP-2885 > Project: Hadoop Core > Issue Type: Sub-task > Components: dfs > Reporter: Sanjay Radia > Assignee: Sanjay Radia > Priority: Minor > Fix For: 0.17.0 > > Attachments: Prototype dfs package.png > > > This Jira proposes restructurign the package hadoop.dfs. > 1. Move all server side and internal protocols (NN-DD etc) to hadoop.dfs.server.* > 2. Further breakdown of dfs.server. > - dfs.server.namenode.* > - dfs.server.datanode.* > - dfs.server.balancer.* > - dfs.server.common.* - stuff shared between the various servers > - dfs.protocol.* - internal protocol between DN, NN and Balancer etc. > 3. Client interface: > - hadoop.dfs.DistributedFileSystem.java > - hadoop.dfs.ChecksumDistributedFileSystem.java > - hadoop.dfs.HftpFilesystem.java > - hadoop.dfs.protocol.* - the client side protocol -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.