hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
Date Mon, 04 Dec 2017 23:30:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277748#comment-16277748
] 

Íñigo Goiri commented on HADOOP-14444:
--------------------------------------

[~luky] thanks for this work; looks like a big improvement to the current FTP and SFTP implementations.
I tried to go through the discussion in the JIRA but I'm pretty sure I missed some.
My main question is the transition between the old FTP/SFTP filesystems and the current one.
Should we make the new ones as defaults? Probably, the safest bet would be to release this
and once tested in the wild, make the changes to point to this implementation by default and
maybe mark the old ones as deprecated.

> New implementation of ftp and sftp filesystems
> ----------------------------------------------
>
>                 Key: HADOOP-14444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14444
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Lukas Waldmann
>            Assignee: Lukas Waldmann
>         Attachments: HADOOP-14444.10.patch, HADOOP-14444.11.patch, HADOOP-14444.12.patch,
HADOOP-14444.13.patch, HADOOP-14444.2.patch, HADOOP-14444.3.patch, HADOOP-14444.4.patch, HADOOP-14444.5.patch,
HADOOP-14444.6.patch, HADOOP-14444.7.patch, HADOOP-14444.8.patch, HADOOP-14444.9.patch, HADOOP-14444.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations and performance
issues when dealing with high number of files. Mine patch solve those issues and integrate
both filesystems such a way that most of the core functionality is common for both and therefore
simplifying the maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every single command
but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement over not
pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory whenever
you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance improvement over
not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a particular files
across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message