hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6659) Switch RPC to use Avro
Date Mon, 12 Apr 2010 18:31:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856107#action_12856107

Sanjay Radia commented on HADOOP-6659:

While Avro reflection will make it easier to get AVRO into Hadoop's wire protocol,  I  believe
that  AVRO IDL-driven protocols are necessary for wire compatibility. Why?
* A protocol needs to be designed to be compatible. Serialization technologies like PB and
Avro allows one to add/delete fields easily; I like that, but it can be  misleading. I don't
think we designed the current Hadoop protocols  carefully with an eye towards compatibility.
** For example, as part of HDFS-1052 we would like to extend a the blockId to have an additional
Avro would help us do that very easily, but it would not work unless the client side treats
the blockid  as an opaque object that is NOT deserialized and simply passed unintepreted to
the DNs. I think there are many such examples.
* RMI and Hadoop RPC make it too easy to pass any Java object that one is using internally
across the wire. Avro using reflection will continue that.   One needs to examine every object
that is being passed across the wire decide if it necessary and what its type should be.
* PB and Avro are very powerful and useful tools - unfortunately a Reflection based approach
make badly designed protocols appear to be  good because they give you the impression that
your protocol is magically compatible; and it mostly does, but it can miss corner cases and
encourages creating messy protocol that exposes too many types.

Hence I propose that we take every single Hadoop protocol and design it for compatibility
using Avro IDL. (HDFS-1069, MAPREDUCE-1689)   Based on what I have read, I believe Doug is
agreeing with the above.
Switching to IDL however, is not a pluggable change - hence this part needs to be done in
a branch. 

The question I have been struggling with is: what are benefits of the reflection scheme since
I assert that the resulting protocols *cannot* be declared as "the hadoop wire-compatible
protocols". The only benefit is that the reflection based protocols can be used for  cross-language
access to hadoop. 

> Switch RPC to use Avro
> ----------------------
>                 Key: HADOOP-6659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6659
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Doug Cutting
> This is an umbrella issue for moving HDFS and MapReduce RPC to use Avro.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message