hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hongyuan Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
Date Wed, 05 Jul 2017 06:02:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074285#comment-16074285

Hongyuan Li commented on HADOOP-14444:

1、seek cause the client disconnect and connect again, don't think it as a good idea to implment
2、{{AbstractFTPFileSystem}}  means Abstract base for FTP like FileSystems. Sorry to interrupt
you, the ftp protocol is not like sftp protocol any little.the common between the two is that
they use username and password to connect to the ftp/sftp server, then doing a lot of ops.Suggest
to use another name.
3、about the passwd and user
code like below:
{{sftpFile}} is a LsEntry instance.
    String longName = sftpFile.getLongname();
    String[] splitLongName = longName.split(" ");
    String user = getUserOrGroup("user", splitLongName);
    String group = getUserOrGroup("group", splitLongName);

  private String getUserOrGroup(String flag, String[] splitLongName) {

    int count = 0;
    int desPos = getPos(flag);
    for (String element : splitLongName) {

      if (count == desPos && !"".equals(element)) {
        return element;
      if (!"".equals(element))
    return null;

   * generate the pos
   * @param flag
   * @return
  private int getPos(String flag) {

    if ("user".equals(flag)) {
      return 2;
    } else {
      return 3;


4、{{SFTPChannel}}#{{close}} should close the session as well ?  

5、i don't know if i can be seen as a reviewer. I'm just interested in your implements,
Good job. :D 

> New implementation of ftp and sftp filesystems
> ----------------------------------------------
>                 Key: HADOOP-14444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14444
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Lukas Waldmann
>            Assignee: Lukas Waldmann
>         Attachments: HADOOP-14444.2.patch, HADOOP-14444.3.patch, HADOOP-14444.4.patch,
HADOOP-14444.5.patch, HADOOP-14444.patch
> Current implementation of FTP and SFTP filesystems have severe limitations and performance
issues when dealing with high number of files. Mine patch solve those issues and integrate
both filesystems such a way that most of the core functionality is common for both and therefore
simplifying the maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every single command
but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement over not
pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory whenever
you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance improvement over
not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a particular files
across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen surprisingly often

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message