hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
Date Fri, 06 Nov 2015 22:35:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994598#comment-14994598

Haohui Mai commented on HDFS-9117:

bq. As an example, let's say we are writing a native replacement for the dfs tool using the
native libhdfs++ codebase (not the libhdfs compatability layer) that cat do "-ls" and "-copyFromLocal",
etc. To provide Least Astonishment for our consumers, they would expect that a properly configured
Hadoop node [with the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could
run "hdfspp -ls /tmp" and have it automatically find the NN and configure the communications
parameters correctly to talk to their cluster.

Unfortunately the assumption is broken in many ways -- it is fully implementation defined.
 For example, there are issues whether {{HADOOP_HOME}} or {{HADOOP_PREFIX}} should be chosen.
Configuration files are only required to be specified in {{CLASSPATH}} but not necessary in
the {{HADOOP_HOME}} directory. Different vendors might have changed their scripts and put
the configuration in different places. Scripts evolves across versions. We have very different
scripts between trunk and branch-2.

While it definitely useful in the libhdfs compatibility layer, I'm doubtful it should be added
into the core part of the library due to all these complexity.

Therefore I believe that the focus of the library should be providing mechanisms to interact
with HDFS but not concrete policy (e.g., location of the configuration) on how to interact.
We don't have any libraries to implement the protocols and mechanisms to interact with HDFS
yet (which is the reusable part). The policy is highly customized in different environments
but it can be worked around easily (which is the less reusable part).

bq. given this context, do you agree that we need to support libhdfs++ compatibility with
the hdfs-site.xml files that are already deployed at customer 

There are two levels of APIs when you talk about libhdfs++ APIs. The core API focuses on providing
mechanisms to interact with HDFS, such as implementing the Hadoop RPC, DataTransferProtocol.
The API that you're referring to might be a convenient API for libhdfs++. The functionality
is definitely helpful, but it can be provided as a utility helper instead of baking it into
the main contract of libhdfs++.

My suggestion is the following:

1. Focusing on getting the code on parsing XML in strings (which is the core functionality
of parsing configuration) in this jira. It should not contain any file operations.
2. Separating the tasks on searching through paths, reading files, etc. into different jiras.
For now it makes sense to put it along with the {{libhdfs}} compatibility layer. Since it's
an implementation detail I believe we can quickly go through it. At a later point of time
we can promote the code to a common library once we have a proposal on how the libhdfs++ convenient
APIs look like.

> Config file reader / options classes for libhdfs++
> --------------------------------------------------
>                 Key: HDFS-9117
>                 URL: https://issues.apache.org/jira/browse/HDFS-9117
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: HDFS-8707
>            Reporter: Bob Hansen
>            Assignee: Bob Hansen
>         Attachments: HDFS-9117.HDFS-8707.001.patch, HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch,
HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, HDFS-9117.HDFS-8707.006.patch,
HDFS-9117.HDFS-8707.008.patch, HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch,
HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, HDFS-9117.HDFS-9288.007.patch
> For environmental compatability with HDFS installations, libhdfs++ should be able to
read the configurations from Hadoop XML files and behave in line with the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML configuration
> Similarly, an internal Options architecture for libhdfs++ should be developed to efficiently
transport the configuration information within the system.

This message was sent by Atlassian JIRA

View raw message