hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-16064) Load configuration values from external sources
Date Tue, 22 Jan 2019 13:23:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Elek, Marton updated HADOOP-16064:
----------------------------------
    Attachment: HADOOP-16064.001.patch

> Load configuration values from external sources
> -----------------------------------------------
>
>                 Key: HADOOP-16064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16064
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: HADOOP-16064.001.patch
>
>
> This is a proposal to improve the Configuration.java to load configuration from external
sources (kubernetes config map, external http reqeust, any cluster manager like ambari, etc.)
> I will attach a patch to illustrate the proposed solution, but please comment the concept
first, the patch is just poc and not fully implemented.
> *Goals:*
>  * Load the configuration files (core-site.xml/hdfs-site.xml/...) from external locations
instead of the classpath (classpath remains the default)
>  * Make the configuration loading extensible
>  * Make it in an backward-compatible way with minimal change in the existing Configuration.java
> *Use-cases:*
>  1.) load configuration from the namenode ([http://namenode:9878/conf]). With this approach
only the namenode should be configured, other components require only the url of the namenode
>  2.) Read configuration directly from kubernetes config-map (or mesos)
>  3.) Read configuration from any external cluster management (such as Apache Ambari
or any equivalent)
>  4.) as of now in the hadoop docker images we transform environment variables (such
as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help of a python script.
With the proposed implementation it would be possible to read the configuration directly from
the system environment variables.
> *Problem:*
> The existing Configuration.java can read configuration from multiple sources. But most
of the time it's used to load predefined config names ("core-site.xml" and "hdfs-site.xml")
without configuration location. In this case the files will be loaded from the classpath.
> I propose to add additional option to define the default location of core-site.xml and
hdfs-site.xml (any configuration which is defined by string name) to use external sources
in the classpath.
> The configuration loading requires implementation + configuration (where are the external
configs). We can't use regular configuration to configure the config loader (chicken/egg).
> I propose to use a new environment variable HADOOP_CONF_SOURCE
> The environment variable could contain a URL, where the schema of the url can define
the config source and all the other parts can configure the access to the resource.
> Examples:
> HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]
> HADOOP_CONF_SOURCE=env://prefix
> HADOOP_CONF_SOURCE=k8s://config-map-name
> The ConfigurationSource interface can be as easy as:
> {code:java}
> /**
>  * Interface to load hadoop configuration from custom location.
>  */
> public interface ConfigurationSource {
>   /**
>    * Method will be called one with the defined configuration url.
>    *
>    * @param uri
>    */
>   void initialize(URI uri) throws IOException;
>   /**
>    * Method will be called to load a specific configuration resource.
>    *
>    * @param name of the configuration resource (eg. hdfs-site.xml)
>    * @return List of loaded configuraiton key and values.
>    */
>   List<ParsedItem> readConfiguration(String name);
> }{code}
> We can choose the right implementation based the schema of the uri and with Java Service
Provider Interface mechanism (META-INF/services/org.apache.hadoop.conf.ConfigurationSource)
> It could be with minimal modification in the Configuration.java (see the attached patch
as an example)
>  The patch contains two example implementation:
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*
> This can load configuration from environment variables based on a naming convention (eg.
HDFS-SITE.XML_hdfs.dfs.key=value)
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*
>  This implementation can load the configuration from a /conf servlet of any Hadoop components.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message