hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagan Brahmi <gaganbra...@gmail.com>
Subject Re: Namenode automatic failover - how to handle WebHDFS URL?
Date Wed, 08 Jun 2016 19:40:35 GMT
Hi Vamsi,

WebHDFS is not HA aware however there is a WebHDFSFileSystem provided
through https://issues.apache.org/jira/browse/HDFS-5122. You can try to
utilize it in your code.

Or you have other options of either using HttpFS or Knox.

HttpFS works with HA enabled HDFS cluster. However, there are several
limitations of using HttpFS. The biggest one can be the performance. HttpFS
is to be installed as an additional service and will be streamed through a
single node. This can result in performance bottleneck. WebHDFS on the
other hand streams data from each datanode.

The other option is to use Knox gateway (if already installed) and
configure WebHDFS through it. Knox provides basic failover and retry
functionality for REST API calls made to WebHDFS when HDFS HA has been
configured and enabled.

This will certainly mean you have to install and configure Knox gateway
service if not already installed.

Gagan Brahmi

On Wed, Jun 8, 2016 at 10:35 AM, Vamsi Krishna <vamsi.attluri@gmail.com>

> Hi,
> How to handle WebHDFS URL in case of Namenode automatic failover in HA
> HDFS Cluster?
> When working with HDFS CLI replacing the ‘<HOST>:<RPC_PORT>’ with ‘
> DFS.NAMESERVICES’ (from hdfs-site.xml) value in the HDFS URI is fetching
> me the same result as with ‘<HOST>:<RPC_PORT>’.
> By using the ‘DFS.NAMESERVICES’ in the HDFS URI I do not need to change
> my HDFS CLI commands in case of Namenode automatic failover.
> *Example:*
> hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>
> hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>
> *WebHDFS:*
> WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...
> Is there a way to frame the WebHDFS URL so that we don’t have to change
> the URL (host) in case of Namenode automatic failover (failover from
> namenode-1 to namenode-2)?
> http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS
> *Scenario:*
> I have a web application which uses WebHDFS HTTP request to read data
> files from Hadoop cluster.
> I would like to know if there is a way to make the web application work
> without any downtime in case of Namenode automatic failover (failover
> from namenode-1 to namenode-2)
> Thanks,
> Vamsi Attluri
> --
> Vamsi Attluri

View raw message