hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-10389) Native RPCv9 client
Date Mon, 16 Jun 2014 18:48:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032698#comment-14032698
] 

Haohui Mai edited comment on HADOOP-10389 at 6/16/14 6:47 PM:
--------------------------------------------------------------

bq. If the code is so impenetrable that review is truly impossible, please consider demonstrating
the virtues of C++11 in a competing branch.

As a weekend project, I've uploaded a proof-of-concept patch in C++ for this jira. It implements
a synchronous RPC engine and the {{exists()}} call for libhdfs.

The main point is not (and never intends to be) to demonstrate c++ is superior to c, but to
show that it is practical to using current technology to improve code quality and to cut down
the maintenance cost down the road. In my personal opinion it is important to ensure that
the code can be easily maintained by the community. The patch demonstrates a viable way to
approach this goal.

Just a couple points to highlight:

# Simpler implementation. Thanks to the standard libraries, there is no need to put things
like RB trees, hash tables, linked lists into the patch and to ask for reviews. Though not
fully equivalent, the patch is an order of magnitude smaller in terms of size.
# Automatic resource management. Explicit resource management, (e.g., {{malloc()}} / {{free()}},
and closing sockets) are no longer required. The life time of resource matches the scope of
the code. It is a systematic way to avoid resource leaks.
# Stronger claims from the type systems. Modern language allows the code to be written in
a mostly type-safe way, where the type system is able to show that the majority of the code
is safe. There is only one cast required in the code compared to many (each in the linked
list) in the old implementation. New constructs like {{std::array}} also allow integrate bounds
check in the code to help prevent buffer overflows.

Just to reiterate, by no means I'm trying to claim that the current patch is perfect and free
of bugs. People myself included make mistakes all the time. A modern tool, however, can help
people to catch the mistakes at the beginning of the development cycle and to avoid them.
Thoughts?


was (Author: wheat9):
bq. If the code is so impenetrable that review is truly impossible, please consider demonstrating
the virtues of C++11 in a competing branch.

As a weekend project, I've uploaded a proof-of-concept patch in C++ for this jira. It implements
a synchronous RPC engine and the {{exists()}} call for libhdfs.

The main point is not (and never intends to be) to demonstrate c++ is superior to c, but to
show that it is practical to using current technology to improve code quality and to cut down
the maintenance cost down the road. In my personal opinion it is important to ensure that
the code can be easily maintained by the community. The patch demonstrates a viable way to
approach this goal.

Just a couple points to highlight:

# Simpler implementation. Thanks to the standard libraries, there is no need to put things
like RB trees, hash tables, linked lists into the patch and to ask for reviews. Though not
fully equivalent, the patch is an order of magnitude smaller in terms of size.
# Automatic resource management. Explicit resource management, (e.g., {{malloc()}} / {{free()}},
and closing sockets) are no longer required. The life time of resource matches the scope of
the code. It is a systematic way to avoid resource leaks.
# Stronger claims from the type systems. Modern language allows the code to be written in
a mostly type-safe way, where the type system is able to show that the majority of the code
is safe. There is only one cast required in the code compared to many (each in the linked
list) in the old implementation. New constructs like {{std::array}} also allow integrate bounds
check in the code to help prevent buffer overflows.

Just to reiterate, by no means I'm trying to claim that the current patch is perfect and free
of bugs. People myself included make mistakes all the time. A modern tool, however, can help
people to catch the mistakes at the beginning of the development cycle and to avoid them.
I believe that this is what good software engineering practice should do. I don't see why
this is a philosophical debate on which language is better (though I can be happily convinced
in either way), and why writing safer, and easier to approach correctness can be out-of-the-scope
during development.

> Native RPCv9 client
> -------------------
>
>                 Key: HADOOP-10389
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10389
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: HADOOP-10388
>            Reporter: Binglin Chang
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-10388.001.patch, HADOOP-10389-alternative.000.patch, HADOOP-10389.002.patch,
HADOOP-10389.004.patch, HADOOP-10389.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message