hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Mason" <charlie....@gmail.com>
Subject Re: PyLucene JCC, C++, C, JNI, and Thrift APIs
Date Mon, 19 Jan 2009 11:30:49 GMT
On Thu, Jan 15, 2009 at 2:08 AM, Wesley Chow <wes.chow@s7labs.com> wrote:
> The HBase meetup was great -- thanks for putting together the Skype chat for
> those of us in the rest of the world.
> There was some talk about a C API via Thrift. The PyLucene folk have a code
> generator for using C++ and Python with JNI:
> http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README. It seems to me
> that this might be a reasonable route as well, though I have no clue how
> active or stable that code is.
> But, a couple of questions:
> Does anybody care if there's a C++ API, but no C API?
> Is HBase RPC better than Thrift? If so, can Thrift really beat JNI? If not,
> prefer PyLucene's JCC over Thrift.
> If HBase RPC is worse than Thrift, then adopting Thrift and dropping RPC
> seems smart to me. You save on the messaging layer work, plus you get all
> those other language bindings for free.
Thrift is certainly very useful. I have just release a HBase ORM like
interface called OHM
(http://belowdeck.kissintelligentsystems.com/ohm/). This is designed
to be cross platform, the Thrift API's are essential to us as most of
our project is written in C#. OHM has a compiler which generates the
interface code for each language. If we didn't have the thrift API's
it would be difficult to interface languages like .Net, & Perl.

I may be wrong but doesn't the current thrift api implementation just
provide an interface to the existing Java client. I asked the question
about the typical production use case for thrift api and was told that
you have the thrift server running on each client (web server), so
thrift only uses a local connection and that the Java client then
talks to the cluster using the Hadoop RPC.

You could I am sure replace the Hadoop RPC with a thrift based one,
but wouldn't you need all the client logic to be reimplemented to take
advantage of that language nutrality. Would this not lead to spliting
lots of development effort trying to keep all the clients feature
complete and compatible. May be I am over estimating the amount of
work in porting the Client but it would certainly cut out one RPC

I don't know much about the Hadoop RPC, but I known Thirft is designed
to be very efficient in terms of bytes sent down the wire, light years
head of XML based RPC's. I have looked at some of the implementation
details of Google's own RPC tool and its very impressive what lengths
they go to, get this efficiency. They even have a more efficient way
of encoding integers on to the wire for most use cases than just
dumping the bits from memory. Of course some of these trade CPU time
in exchange for that efficiency. In a cluster environment where
everything is on Gigabit links, I am not so sure if its such a good
idea, I suppose only benchmarking would tell.

Charlie M

View raw message