hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HDFS-13894) Access HDFS through a proxy and natively
Date Tue, 04 Sep 2018 23:45:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603732#comment-16603732
] 

Íñigo Goiri commented on HDFS-13894:
------------------------------------

The setup we internally have is an HDFS cluster in Azure VMs where the Routers are exposed
through a load balancer.
To access metadata we just point to the Load Balancer.
However, to access the data itself, we need to use HttpFs which uses WebHDFS to proxy the
requests to the DNs.

In core-default.xml, we set:
{code}
  <property>
    <name>fs.hdfs.impl</name>
    <value>org.apache.hadoop.hdfs.HdfsWithProxyFileSystem</value>
  </property>
  <property>
    <name>fs.AbstractFileSystem.hdfs.impl</name>
    <value>org.apache.hadoop.fs.AbstractHdfsWithProxyFileSystem</value>
  </property>
  <property>
    <name>fs.hdfs.proxy.azure-cluster-fed</name>
    <value>webhdfs://loadbalancer.azure.com:<PROXY-PORT>/</value>
  </property>
{code}

In hdfs-site.xml, we set:
{code}
  <property>
    <name>dfs.nameservices</name>
    <value>azure-cluster-fed</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.azure-cluster-fed</name>
    <value>routerinternaladdress:<RPC-PORT></value>
  </property>
{code}

Then, the user sets the environment variable {{HDFS_USE_PROXY}} to {{true}} in the client
machine.
The {{HdfsWithProxyFileSystem}} will use the proxy address in the client machine and the native
HDFS address when running inside of the firewall.

> Access HDFS through a proxy and natively
> ----------------------------------------
>
>                 Key: HDFS-13894
>                 URL: https://issues.apache.org/jira/browse/HDFS-13894
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-13894.000.patch
>
>
> HDFS deployments are usually behind a firewall where one can access the Namenode but
not the Datanodes. To mitigate this situation there are proxies that catch the DN requests
(e.g., HttpFS). However, if a user submits a job using the HttpFS endpoint, all the workers
will use such endpoint which will usually be a bottleneck.
> We should create a new filesystem that supports accessing both:
> * HttpFS for submission from outside the firewal
> * HDFS from within the cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message