hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Gammeter <gamme...@vision.ee.ethz.ch>
Subject Re: A question about RPC
Date Tue, 11 Oct 2011 19:44:04 GMT
When i started writing i was not aware of zmq yet, so the connection 
layer is just written using boost, but it would be quite simple to 
replace that with zmq, just have not gotten around to it yet.
 > i really don't see the need for people reinventing the low level rcp 
stuff over and over again.
I totally agree! The thing I like about the protobufs is however that 
you can directly write your service definition in the .proto file which 
gets parsed for you by the protocol buffer compiler and you can access 
the parsed definitions via protoc plugins for example. With the plugins 
+ protoc compiler nicely integrated into a cross language build system, 
it's not minimal effort to have clients in different languages, it's 
zero effort. Also it takes a little bit of code if you want to have 
multiple services on the same port, being able to list all services and 
get service definitions, etc... Basically i have just taken care of that 
part.

 > for rcp the comparison of speed is a bit of a moot point, since all 
the latency will be in the communication, not so much in the 
serialization, i suspect.
Yes the communication of course adds latency but i am talking about 
throughput here. If you believe comparison of speed is a bit of a moot 
point, then you must be one of the fortunate people that never had to 
use SOAP ;) also you need to properly hide the latency and also make 
sure you minimize copying data around in memory etc... But I am certain 
zeromq does a much better job there than my boost implementation ;) 
haven't benchmarked it yet though.

 > but once you communicate using protobuf it also becomes really 
tempting to store in hadoop using protobuf instead of 
writables/sequencefiles, and from what i have heard (i have not tested 
this myself) it is a good deal slower in that situation.
What do you mean of just using protobuf instead of 
writables/sequencefiles exactly? I.e. let's assume you just use some 
ProtobufToWritable adapter, i don't see how that would be much slower 
than using writables, writables and protobufs really just do the same 
job, do they not? Except that protobufs are available in other 
languages, are defined via the proto language etc... If you use 
writables or protobufs, you most likely can serialize faster than you 
can write to disk or to network. At least that is my feeling so far from 
using protobufs to store stuff in hbase or raw hfiles, but i have to 
admit, i have not properly benchmarked this. What kind of fileformat 
would you use to write serialized protobufs to, that would make it so 
slow? I guess in the end, one just needs to benchmark everything :)

TL;DR
.proto + protocol buffer plugins for generating rpc clients and servers 
is really handy. If writables or protobufs are faster needs to be 
benchmarked, but probably both serialize faster than one can write.

On 23.09.2011 15:40, Koert Kuipers wrote:
> did you build it on top of zmq? i really don't see the need for people 
> reinventing the low level rcp stuff over and over again. zmq comes 
> with baked in request-response, pub-sub, and pipeline (distributed 
> processing) communication. once you rely on protobuf + zmq for the rpc 
> is it trivial to add clients in other languages, i had java, R and 
> python talking to each other with minimal effort.
>
> for rcp the comparison of speed is a bit of a moot point, since all 
> the latency will be in the communication, not so much in the 
> serialization, i suspect. but once you communicate using protobuf it 
> also becomes really tempting to store in hadoop using protobuf instead 
> of writables/sequencefiles, and from what i have heard (i have not 
> tested this myself) it is a good deal slower in that situation.
>
> On Fri, Sep 23, 2011 at 8:14 AM, Stephan Gammeter 
> <gammeter@vision.ee.ethz.ch <mailto:gammeter@vision.ee.ethz.ch>> wrote:
>
>     I don't think protobuf are slower than writable actually, they do
>     really well in speed. I actually wrote some rpc code in C++ for
>     protocolbuffers and some swig wrappers to have clients in java. A
>     simple c++ server can easily handle about 20k qps in that setup
>     and this is just with a naive implementation where still some
>     excess data copies happen during the processing of requests. If i
>     have time i would like to opensource it, but i would need some
>     help to get it running properly in other languages, so that it can
>     be truly cross language. (right now servers are only supported in
>     c++, clients are synchronous and asynchronous in c++, in java only
>     synchronous clients are supported)
>
>
>     On 21.09.2011 22 <tel:21.09.2011%2022>:59, Koert Kuipers wrote:
>>     i would love an IDL, plus that modern serialization frameworks
>>     such as protobuf/thrift support versioning (although i still have
>>     issues with different versions of thrift not working nicely
>>     together, argh why is that). the only downside is perhaps that
>>     they are a little slower than writables.
>>
>>     On Wed, Sep 21, 2011 at 3:12 AM, Uma Maheswara Rao G 72686
>>     <maheswara@huawei.com <mailto:maheswara@huawei.com>> wrote:
>>
>>         Hadoop has its RPC machanism mainly Writables to overcome
>>         some of the disadvantages on normal serializations.
>>         For more info:
>>         http://www.lexemetech.com/2008/07/rpc-and-serialization-with-hadoop.html
>>
>>         Regards,
>>         Uma
>>         ----- Original Message -----
>>         From: jie_zhou <jie_zhou@xa.allyes.com
>>         <mailto:jie_zhou@xa.allyes.com>>
>>         Date: Wednesday, September 21, 2011 12:12 pm
>>         Subject: A question about RPC
>>         To: hdfs-user@hadoop.apache.org
>>         <mailto:hdfs-user@hadoop.apache.org>
>>
>>         > Dear:
>>         >
>>         > Nice to meet you!
>>         >
>>         > I am a beginner of hadoop. Recently, I have seen the source
>>         of RPC of
>>         > hadoop,but now I have a question. As we know,hadoop RPC
>>         make use
>>         > of Dynamic
>>         > proxy mechanism ,but
>>         >
>>         > why not use IDL such as CORBA, or AIDL of Android?
>>         >
>>         > Thanks for your early reply.
>>         >
>>         > Best Regards,
>>         >
>>         > jie
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>
>>
>
>


Mime
View raw message