hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation
Date Thu, 09 Jun 2016 22:20:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323487#comment-15323487
] 

Kai Zheng commented on HADOOP-12756:
------------------------------------

Have looked at the patch, some comments about the codes:

1. There are lots of configuration items introduced in {{Constants}}. It would be great if
we could follow the pattern used in {{CommonConfigurationKeys}} to name the property keys.
For example, in
{code}
 // Time until we give up on a connection to oss
 public static final String SOCKET_TIMEOUT = "fs.oss.connection.timeout";
 public static final int DEFAULT_SOCKET_TIMEOUT = 200000;
{code}
SOCKET_TIMEOUT -> SOCKET_TIMEOUT_KEY

2. As discussed above, better to: org.apache.hadoop.fs.oss -> org.apache.hadoop.fs.aliyun.oss,
OSSFileSystem -> AliyunOssFileSystem.

3. In the delete operation:
{code}
      if (!recursive) {
        FileStatus[] statuses = listStatus(status.getPath());
        if (statuses.length > 0) {
          throw new IOException("Cannot remove " + path + ": Is a directory!");
        } else {
          // Delete empty directory without '-r'
          ossClient.deleteObject(bucketName, key);
          statistics.incrementWriteOps(1);
        }
      } else {
         // A large block here 
      }
{code}
The exception message could be more specific, like: "Cannot remote the directory, it's not
empty". 
And the comment "Delete empty directory without '-r'" should be a little above.
The large block of codes for delete the directory with '-r' would be better to move to a separate
function.

{code}
    //TODO: optimize logic here
    try {
      Path pPath = status.getPath().getParent();
      FileStatus pStatus = getFileStatus(pPath);
      if (pStatus.isDirectory()) {
        return true;
      } else {
        throw new IOException("Path " + pPath +
            " is assumed to be a directory!");
      }
    } catch (FileNotFoundException fnfe) {
      return mkdir(bucketName, pathToKey(status.getPath().getParent()));
    }
{code}
Any explanation about this post processing logic after the File/Directory object is deleted
successfully? Why the parent directory could be not found, and then we need to do the mkdir
here?

4. Do you want to rename {{validatePath}} to {{validatePathForMkdir}}, as indicated by the
thrown exception:
{code}
          throw new FileAlreadyExistsException(String.format(
              "Can't make directory for path '%s' since it is a file.", fPart));
{code}

5. Please also check codes in other operations like {{rename}}, similarly.

6. Could we move the very AliyunOss specific logic codes to an utility class to simplify the
codes in the file system impl and input/output stream?

The tests look complete and great. Thanks!



> Incorporate Aliyun OSS file system implementation
> -------------------------------------------------
>
>                 Key: HADOOP-12756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12756
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: shimingfei
>            Assignee: shimingfei
>         Attachments: HADOOP-12756-v02.patch, HADOOP-12756.003.patch, HADOOP-12756.004.patch,
HCFS User manual.md, OSS integration.pdf, OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not easy to
access data laid on OSS storage from user’s Hadoop/Spark application, because of no original
support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, Spark/Hadoop
applications can read/write data from OSS without any code change. Narrowing the gap between
user’s APP and data storage, like what have been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message