hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-794) Language neutral IPC as a first class component of HBase architecture
Date Sun, 03 May 2009 07:04:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705388#action_12705388
] 

Bryan Duxbury commented on HBASE-794:
-------------------------------------

Wow, guess I really should have been watching this issue. I'll try and address some things.

Returning null: Thrift methods can't return null *directly*, but they can return a non-null
struct with none of its fields set, or a non-null struct with a flag set. This isn't anything
new necessarily, but I should note that we do this all over the place at Rapleaf to get around
this restriction. You definitely do not need to use exceptions to communicate "null". Moreover,
using exceptions this way is probably worse than you think, as I *think* returning an exception
causes the connection to close, at least in some libraries. Also, it might be possible to
allow null to be returned by Thrift methods _in general_, just for C++ to be unable to return
null. If this is a do-or-die issue, please help us out by opening a ticket over on the Thrift
JIRA so we can discuss solutions.

Thrift's Java RPC layer: I did in fact write a bunch of the server layer to use native Java
NIO. This code lives in TNonblockingServer (single threaded) and THsHaServer (thread pool)
respectively. Both server implementations also add some nice stuff like fixed total read buffer
size (to protect server from overload). It's been very robust in our use of the code at Rapleaf
so far. I would recommend it on the strength of my experiences. 

Garbage/instantiation cost: Thrift objects are probably a little more memory inefficient than
they need to be right now due to some slightly naive implementation decisions, but I've taken
some steps to reducing the overhead of an object. Additionally, you could probably reuse some
instances of objects at the top level with almost no work. With a little work in the library,
you could probably reuse most objects all the way down your instance's object tree, saving
you memory. If you are more interested in this bullet, shoot me an email and we can talk about
it in more detail.

Zero copy system: Right now, Thrift is not zero-copy. I think it would be very cool, though,
to create the framework to make that happen. We'd probably only need to make a few transport
interface changes. Maybe we should open a ticket?

Framed Transport: This is very effective at improving the performance of the Thrift IO stuff,
especially if you're doing real IO without a buffer somewhere in between. It's also mandatory
for using the nonblocking servers.

Custom protocols: Certainly, if you wanted to, you could write your own Thrift protocol. However,
I would say this defeats the purpose of Thrift, in giving you a respectable cross-platform
library out of the box. Further, protobuf as a Thrift protocol has been proposed before, and
the two systems are not trivially compatible.

"Raw" RPC: If your goal is to avoid deserializing some stuff, Chad has previously suggested
having the ability to specify that you don't want certain fields deserialized. I don't know
if this is your objective. If your keys and values are actually just byte arrays on either
side, then there isn't any serialization to speak of, beyond the byte[] copy off the wire.
I could imagine doing something to make this a non-copy operation, though. (See comment above
on zero-copy architecture.)

I think Andrew's idea of making a simulator is a great idea. Otherwise it's going to mean
a ton of work and a subjective evaluation. 

I also want to say that there are few things I would like to improve as much as Thrift performance.
Thrift is a cornerstone at Rapleaf, so anything we can do to make it faster is a big win.
I am eager and willing to work with anyone who can show me use cases that identify slowness
in Thrift so that I can erase the problem. 

> Language neutral IPC as a first class component of HBase architecture
> ---------------------------------------------------------------------
>
>                 Key: HBASE-794
>                 URL: https://issues.apache.org/jira/browse/HBASE-794
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, ipc, master, regionserver
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>
> This issue considers making a language neutral IPC mechanism and wire format a first
class component of HBase architecture. Clients could talk to the master and regionserver using
this protocol instead of HRPC at their option.
> Options for language neutral IPC include:
> * Thrift: http://incubator.apache.org/thrift/
> * Protocol buffers: http://code.google.com/p/protobuf/
> * XDR: http://en.wikipedia.org/wiki/External_Data_Representation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message