hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-12325) SFTPFileSystem operations should restore cwd
Date Sat, 19 Aug 2017 00:24:00 GMT
Chen Liang created HDFS-12325:

             Summary: SFTPFileSystem operations should restore cwd
                 Key: HDFS-12325
                 URL: https://issues.apache.org/jira/browse/HDFS-12325
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Chen Liang
            Assignee: Chen Liang

We've seen a case where writing to {{SFTPFileSystem}} led to unexpected behaviour:

Given a directory ./data with more than one files in it, the steps it took to get this error
was simply:
hdfs dfs -fs sftp://x.y.z -mkdir dir0
hdfs dfs -fs sftp://x.y.z -copyFromLocal data dir0
hdfs dfs -fs sftp://x.y.z -ls -R dir0
But not all files show up as in the ls output, in fact more often just one single file shows
up in that path...

Digging deeper, we found that rename, mkdirs and create operations in {{SFTPFileSystem}} are
changing the current working directory during it's execution. For example in create there
      os = client.put(f.getName());

The issue here is {{SFTPConnectionPool}} is caching SFTP sessions (in {{idleConnections}}),
which contains their current working directory. So after these operations, the sessions will
be put back to cache with a changed working directory. This accumulates in each call and ends
up causing unexpected weird behaviour. Basically this error happens when processing multiple
file system objects in one operation, and relative path is being used. 

The fix here is to restore the current working directory of the SFTP sessions.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message