Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 6 Nov 2015 22:35:12 +0000 (UTC)
From: "Haohui Mai (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12895231.1442863471000.4092.1446849312039@Atlassian.JIRA>
In-Reply-To: <JIRA.12895231.1442863471000@Atlassian.JIRA>
References: <JIRA.12895231.1442863471000@Atlassian.JIRA>
 <JIRA.12895231.1442863471409@arcas>
Subject: [jira] [Commented] (HDFS-9117) Config file reader / options classes
 for libhdfs++
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994598#comment-14994598 ] 

Haohui Mai commented on HDFS-9117:
----------------------------------

bq. As an example, let's say we are writing a native replacement for the dfs tool using the native libhdfs++ codebase (not the libhdfs compatability layer) that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for our consumers, they would expect that a properly configured Hadoop node [with the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run "hdfspp -ls /tmp" and have it automatically find the NN and configure the communications parameters correctly to talk to their cluster.

Unfortunately the assumption is broken in many ways -- it is fully implementation defined.  For example, there are issues whether {{HADOOP_HOME}} or {{HADOOP_PREFIX}} should be chosen. Configuration files are only required to be specified in {{CLASSPATH}} but not necessary in the {{HADOOP_HOME}} directory. Different vendors might have changed their scripts and put the configuration in different places. Scripts evolves across versions. We have very different scripts between trunk and branch-2.

While it definitely useful in the libhdfs compatibility layer, I'm doubtful it should be added into the core part of the library due to all these complexity.

Therefore I believe that the focus of the library should be providing mechanisms to interact with HDFS but not concrete policy (e.g., location of the configuration) on how to interact. We don't have any libraries to implement the protocols and mechanisms to interact with HDFS yet (which is the reusable part). The policy is highly customized in different environments but it can be worked around easily (which is the less reusable part).

bq. given this context, do you agree that we need to support libhdfs++ compatibility with the hdfs-site.xml files that are already deployed at customer 

There are two levels of APIs when you talk about libhdfs++ APIs. The core API focuses on providing mechanisms to interact with HDFS, such as implementing the Hadoop RPC, DataTransferProtocol. The API that you're referring to might be a convenient API for libhdfs++. The functionality is definitely helpful, but it can be provided as a utility helper instead of baking it into the main contract of libhdfs++.

My suggestion is the following:

1. Focusing on getting the code on parsing XML in strings (which is the core functionality of parsing configuration) in this jira. It should not contain any file operations.
2. Separating the tasks on searching through paths, reading files, etc. into different jiras. For now it makes sense to put it along with the {{libhdfs}} compatibility layer. Since it's an implementation detail I believe we can quickly go through it. At a later point of time we can promote the code to a common library once we have a proposal on how the libhdfs++ convenient APIs look like.


> Config file reader / options classes for libhdfs++
> --------------------------------------------------
>
>                 Key: HDFS-9117
>                 URL: https://issues.apache.org/jira/browse/HDFS-9117
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: HDFS-8707
>            Reporter: Bob Hansen
>            Assignee: Bob Hansen
>         Attachments: HDFS-9117.HDFS-8707.001.patch, HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, HDFS-9117.HDFS-9288.007.patch
>
>
> For environmental compatability with HDFS installations, libhdfs++ should be able to read the configurations from Hadoop XML files and behave in line with the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed to efficiently transport the configuration information within the system.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)