hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10389) Native RPCv9 client
Date Fri, 16 May 2014 10:58:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999175#comment-13999175
] 

Colin Patrick McCabe commented on HADOOP-10389:
-----------------------------------------------

bq. <multiple call ids in flight discussion>

I was basing my belief that we don't support multiple call IDs in flight at once off a casual
conversation I had (off the record) with some folks at Hadoop Summit Europe.  It's possible
that the code has improved since then, or that they were out of date.  I admit that I haven't
scrutinized the server code closely enough to give a definitive answer here.

There is an easy way to resolve this, of course: we can modify the C code to put multiple
call IDs in flight at once, and see if the server loses its marbles :)

The important thing to remember is that this is an optimization.  Even if we never do it,
we'll still have a usable native client.  So I'm going to try to get the basic stuff done
first, then perhaps we can circle back on this.  Or maybe if one of you guys wants to do it
in parallel that would work out too.  The tricky part is testing... we need some way to *force*
multiple calls to be in flight on the channel so we know that it works.

bq. We can keep functions, but the repeated code in these functions can be eliminated using
abstraction, so as to reduce the binary code size.

I have a patch which implements most of the native client, based on the existing RPC code.
 The library I generate is only 3 MB, even including a bunch of stuff which has nothing to
do with RPC.  So although I can see that there might be a potential to optimize code size,
I don't think it should be our highest priority right now.

You also have to keep in mind that code which is not used will be stripped out by the linker.
 So if we don't use the async version of a certain RPC (for example), that code will not become
part of {{libhdfs.so}}.  So although you might look at the generated code and go "OMG so much
code!"  it's really not that bad.  This is similar to how in C++, every time you template
{{std::map}} on a different type, you get another set of {{std::map}} functions in your binary.
 In practice, it is usually not a problem.

Still, if you want to work on optimizing generated code size, I would welcome any patches.
 The challenge would be to reduce the code size while still maintaining RPC-specific error
messages and not regressing performance.

> Native RPCv9 client
> -------------------
>
>                 Key: HADOOP-10389
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10389
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: HADOOP-10388
>            Reporter: Binglin Chang
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch,
HADOOP-10389.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message