hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8990) Some minor issus in protobuf based ipc
Date Tue, 22 Jan 2013 01:11:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559262#comment-13559262

Binglin Chang commented on HADOOP-8990:

bq. Note that since rpc has pluggable rpc engine, the rpc request itself is serialized based
on the pluggable rpc engine. For protobuf-rpc-engine the length is serialized as varint.
I don't understand why protobuf-rpc-engine has to be varint prefixd? I mean originally it
is 4 byte int prefixed, and HADOOP-8084 changed it to varint prefixed, and then at some time
later RpcResponseWritable changed to using varint, and RpcRequestWritable keep using 4 byte
I just think 4 byte int is a better choice, because it easy to implement in non-blocking code.

Just for example, see Todd's ipc implementation:
  uint32_t length;
  CodedInputStream cis(&rbuf_[rbuf_consumed_], rbuf_available());
  bool success = cis.ReadVarint32(&length);
  if (!success) {
    if (rbuf_available() >= kMaxVarint32Size) {
      // TODO: error handling
      LOG(FATAL) << "bad varint";
    // Not enough bytes in buffer to read varint, ask for at least another byte
    EnsureAvailableLength(rbuf_available() + 1,
        boost::bind(&RpcClient::ReadResponseHeaderLengthCallback, this,
There are lot of retry logic to read varint in non-blocking code, it is much easier if it's
fixed 4 byte, and hence unified. 

> Some minor issus in protobuf based ipc
> --------------------------------------
>                 Key: HADOOP-8990
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8990
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Binglin Chang
>            Priority: Minor
> 1. proto file naming
> RpcPayloadHeader.proto include not only RpcPayLoadHeaderProto, but also RpcResponseHeaderProto,
which is irrelevant to the file name.
> hadoop_rpc.proto only include HadoopRpcRequestProto, and the filename "hadoop_rpc" is
strange comparing to other .proto file names.
> How about merge those two file into HadoopRpc.proto?
> 2. proto class naming
> In rpc request RpcPayloadHeaderProto includes callId, but in rpc response callId is included
in RpcResponseHeaderProto, and there is also HadoopRpcRequestProto, this is just too confusing.
> 3. The rpc system is not fully protobuf based, there are still some Writables:
> RpcRequestWritable and RpcResponseWritable.
> rpc response exception name and stack trace string.
> And RpcRequestWritable uses protobuf style varint32 prefix, but RpcResponseWritable uses
int32 prefix, why this inconsistency?
> Currently rpc request is splitted into length, PayLoadHeader and PayLoad, and response
into RpcResponseHeader, response and error message. 
> I think wrap request and response into single RequstProto and ResponseProto is better,
cause this gives a formal complete wire format definition, 
> or developer need to look into the source code and hard coding the communication format.
> These issues above make it very confusing and hard for developers to use these rpc interfaces.
> Some of these issues can be solved without breaking compatibility, but some can not,
but at least we need to know what will be changed and what will stay stable?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message