hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
Date Thu, 19 Feb 2015 23:55:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328351#comment-14328351

Haohui Mai commented on HDFS-6994:

bq. That is interesting. Why do you feel libhdfs3 and the Java client cannot support "thousands
of files concurrently"? What are you doing differently that you believe will be better for
this application?

I should have worded it more precisely. It is not about can or can't but about whether it
can be done efficiently. The current APIs of the Java client are thread-based, synchronous
APIs. They are simpler to program but to reduce latency it requires creating one thread per
stream. In resource-constrained environments (e.g., applications running inside a YARN container)
it becomes an important concern as accessing thousands of files concurrently requires thousands
of threads.

libhdfs / libhdfs3 suffer from the same problem as the APIs of libhdfs / libhdfs3 follow closely
of the APIs of the Java client we have today.

Fundamentally it is an issue tied to the synchronous APIs but not to specific implementation.
Alternatively, event-based, asynchronous APIs are harder to program but they can be implemented
with bounded amount of resources. Applications that need to access thousands of files concurrently
in resource-constrained environment can benefit from this.

> libhdfs3 - A native C/C++ HDFS client
> -------------------------------------
>                 Key: HDFS-6994
>                 URL: https://issues.apache.org/jira/browse/HDFS-6994
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client
>            Reporter: Zhanwei Wang
>            Assignee: Zhanwei Wang
>         Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch
> Hi All
> I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client
based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
> libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP
RPC version 8 and 9. Support Namenode HA and Kerberos authentication.
> libhdfs3 is currently used by HAWQ of Pivotal
> I'd like to integrate libhdfs3 into HDFS source code to benefit others.
> You can find libhdfs3 code from github
> https://github.com/PivotalRD/libhdfs3
> http://pivotalrd.github.io/libhdfs3/

This message was sent by Atlassian JIRA

View raw message