reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <mar...@weimo.de>
Subject Re: [REEF-1892] HDFS File Copy only uses local HDFS
Date Sun, 24 Sep 2017 17:10:34 GMT
This looks like a really nasty interaction between the cluster
infrastructure and our code:

REEF-1827 became necessary because some clusters have odd DNS setups where
the capitalization of hostnames mattered.
`hdfs://MyFaNcyNaMeNode/some/path.txt` would not evaluate to the same file
as `hdfs://myfancynamenode/some/path.txt`. Stripping the protocol and host
from the URL fixes that.

However, that assumes that the relative path given then is evaluated with
respect to the right host and protocol. This assumption is true, if it
references a file on the *default* protocol and host of the cluster.
However, that default filesystem on HDI seems to be the local HDFS of the
cluster, not the WASB filesystem.

There is no pretty solution that comes to mind. From a principled
standpoint, we should undo REEF-1827. Hostnames are supposed to be case
insensitive. However, clusters which don't adhere to that standard exist.
Hence, we might need some work-around for them.

Markus

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message