hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HDFS-13894) Access HDFS through a proxy and natively
Date Tue, 04 Sep 2018 23:45:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603732#comment-16603732

Íñigo Goiri commented on HDFS-13894:

The setup we internally have is an HDFS cluster in Azure VMs where the Routers are exposed
through a load balancer.
To access metadata we just point to the Load Balancer.
However, to access the data itself, we need to use HttpFs which uses WebHDFS to proxy the
requests to the DNs.

In core-default.xml, we set:

In hdfs-site.xml, we set:

Then, the user sets the environment variable {{HDFS_USE_PROXY}} to {{true}} in the client
The {{HdfsWithProxyFileSystem}} will use the proxy address in the client machine and the native
HDFS address when running inside of the firewall.

> Access HDFS through a proxy and natively
> ----------------------------------------
>                 Key: HDFS-13894
>                 URL: https://issues.apache.org/jira/browse/HDFS-13894
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-13894.000.patch
> HDFS deployments are usually behind a firewall where one can access the Namenode but
not the Datanodes. To mitigate this situation there are proxies that catch the DN requests
(e.g., HttpFS). However, if a user submits a job using the HttpFS endpoint, all the workers
will use such endpoint which will usually be a bottleneck.
> We should create a new filesystem that supports accessing both:
> * HttpFS for submission from outside the firewal
> * HDFS from within the cluster

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message