hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Namenode automatic failover - how to handle WebHDFS URL?
Date Wed, 08 Jun 2016 19:29:06 GMT
Hello Vamsi,

A general-purpose HTTP client like curl won't have knowledge of the HA failover mechanism,
so unfortunately it won't be possible to craft the URL in a certain way so that it can failover
automatically.

However, Hadoop ships with the WebHdfsFileSystem class, which does have awareness of HA failover.
 If your web application is coded in Java, or has a reasonable way to bridge over to Java,
then you could take advantage of that class.  This class gets executed when running Hadoop
shell commands that reference a URI containing the webhdfs: scheme.  For example:

hdfs dfs -ls webhdfs://127.0.0.1:50070/

You could also get an instance of WebHdfsFileSystem by calling FileSystem#get with a Configuration
object that sets fs.defaultFS to a webhdfs: URI, or call the overload of FileSystem#get that
accepts an explicit URI argument.

--Chris Nauroth

From: Vamsi Krishna <vamsi.attluri@gmail.com<mailto:vamsi.attluri@gmail.com>>
Date: Wednesday, June 8, 2016 at 10:35 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Namenode automatic failover - how to handle WebHDFS URL?


Hi,

How to handle WebHDFS URL in case of Namenode automatic failover in HA HDFS Cluster?


HDFS CLI:

HDFS URI: hdfs://<HOST>:<RPC_PORT>/<PATH>

When working with HDFS CLI replacing the '<HOST>:<RPC_PORT>' with 'DFS.NAMESERVICES'
(from hdfs-site.xml) value in the HDFS URI is fetching me the same result as with '<HOST>:<RPC_PORT>'.

By using the 'DFS.NAMESERVICES' in the HDFS URI I do not need to change my HDFS CLI commands
in case of Namenode automatic failover.

Example:

hdfs dfs -ls hdfs://<HOST>:<RPC_PORT>/<PATH>

hdfs dfs -ls hdfs://<DFS.NAMESERVICES>/<PATH>


WebHDFS:

WebHDFS URL: http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...

Is there a way to frame the WebHDFS URL so that we don't have to change the URL (host) in
case of Namenode automatic failover (failover from namenode-1 to namenode-2)?

http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

Scenario:
I have a web application which uses WebHDFS HTTP request to read data files from Hadoop cluster.
I would like to know if there is a way to make the web application work without any downtime
in case of Namenode automatic failover (failover from namenode-1 to namenode-2)

Thanks,
Vamsi Attluri
--
Vamsi Attluri

Mime
View raw message