reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rogan Carr <rogan.c...@gmail.com>
Subject [REEF-1892] HDFS File Copy only uses local HDFS
Date Sun, 24 Sep 2017 05:57:32 GMT
Hi All,

I have opened an issue, REEF-1892 because file IO to WASB for REEF 0.17.x
is broken.

In REEF-1827 [2], the URI used to specify remote and local files were
changed to use the "AbsolutePath". [3]

This means that a file specified as "hdfs://my/file" becomes "/my/file" and
the hdfs:// is assumed by the `dfs` command.

This is fine if you are using vanilla HDFS, but for cases like Blob Storage
in Azure, there is a special prefix, `wasb://` that is used instead of
`hdfs://`. This means that the AbsolutePath method trims off the "wasb",
and this Copy() function instead attempts to download the file from the
local HDFS instead of WASB.

Best,
Rogan

[1] https://issues.apache.org/jira/browse/REEF-1892

[2] https://issues.apache.org/jira/browse/REEF-1827

[3] The code in question
public void Copy(Uri sourceUri, Uri destinationUri)
{

- _commandRunner.Run("dfs -cp " + sourceUri + " " + destinationUri);

+ _commandRunner.Run("dfs -cp " + sourceUri.AbsolutePath + " " +
destinationUri.AbsolutePath);

}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message