Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD96E18F11 for ; Fri, 6 Nov 2015 22:35:12 +0000 (UTC) Received: (qmail 39309 invoked by uid 500); 6 Nov 2015 22:35:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 39244 invoked by uid 500); 6 Nov 2015 22:35:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 38609 invoked by uid 99); 6 Nov 2015 22:35:12 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2015 22:35:12 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0A5BE2C1F72 for ; Fri, 6 Nov 2015 22:35:12 +0000 (UTC) Date: Fri, 6 Nov 2015 22:35:12 +0000 (UTC) From: "Haohui Mai (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994598#comment-14994598 ] Haohui Mai commented on HDFS-9117: ---------------------------------- bq. As an example, let's say we are writing a native replacement for the dfs tool using the native libhdfs++ codebase (not the libhdfs compatability layer) that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for our consumers, they would expect that a properly configured Hadoop node [with the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run "hdfspp -ls /tmp" and have it automatically find the NN and configure the communications parameters correctly to talk to their cluster. Unfortunately the assumption is broken in many ways -- it is fully implementation defined. For example, there are issues whether {{HADOOP_HOME}} or {{HADOOP_PREFIX}} should be chosen. Configuration files are only required to be specified in {{CLASSPATH}} but not necessary in the {{HADOOP_HOME}} directory. Different vendors might have changed their scripts and put the configuration in different places. Scripts evolves across versions. We have very different scripts between trunk and branch-2. While it definitely useful in the libhdfs compatibility layer, I'm doubtful it should be added into the core part of the library due to all these complexity. Therefore I believe that the focus of the library should be providing mechanisms to interact with HDFS but not concrete policy (e.g., location of the configuration) on how to interact. We don't have any libraries to implement the protocols and mechanisms to interact with HDFS yet (which is the reusable part). The policy is highly customized in different environments but it can be worked around easily (which is the less reusable part). bq. given this context, do you agree that we need to support libhdfs++ compatibility with the hdfs-site.xml files that are already deployed at customer There are two levels of APIs when you talk about libhdfs++ APIs. The core API focuses on providing mechanisms to interact with HDFS, such as implementing the Hadoop RPC, DataTransferProtocol. The API that you're referring to might be a convenient API for libhdfs++. The functionality is definitely helpful, but it can be provided as a utility helper instead of baking it into the main contract of libhdfs++. My suggestion is the following: 1. Focusing on getting the code on parsing XML in strings (which is the core functionality of parsing configuration) in this jira. It should not contain any file operations. 2. Separating the tasks on searching through paths, reading files, etc. into different jiras. For now it makes sense to put it along with the {{libhdfs}} compatibility layer. Since it's an implementation detail I believe we can quickly go through it. At a later point of time we can promote the code to a common library once we have a proposal on how the libhdfs++ convenient APIs look like. > Config file reader / options classes for libhdfs++ > -------------------------------------------------- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Affects Versions: HDFS-8707 > Reporter: Bob Hansen > Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, HDFS-9117.HDFS-9288.007.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be able to read the configurations from Hadoop XML files and behave in line with the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)