hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiago Jackiw" <tjac...@gmail.com>
Subject Re: Talking to HBase via tcp/socket
Date Thu, 13 Dec 2007 03:42:14 GMT
Sorry I missed the rest of this conversation.

Cool so this Thrift 'server' really seems the best way to go, I guess
I wasn't really understanding much about it before.

Any ideas when we should start seeing development on this?

--
Thiago


2007/12/9 edward yoon <webmaster@udanax.org>:
>
> Good idea, chad.
> Isn't it so much better when we discuss like this, rather than unilateral condemn one
another? :)
>
> ------------------------------
>
> B. Regards,
>
> Edward yoon @ NHN, corp.
> Home : http://www.udanax.org
>
>
> > From: chad@powerset.com
> > To: hadoop-user@lucene.apache.org
> > Date: Sat, 8 Dec 2007 10:58:21 -0800
> > Subject: Re: Talking to HBase via tcp/socket
> >
> >
> > I think we might be talking past each other a little bit here. Here is my attempt
to clarify my suggestion and some of my thinking - hopefully it will help.
> >
> > First, let me state that I fully support the notion of a socket based interface
to Hbase.
> > The REST API is a great starter API - low barrier to entry for many developers to
start playing with Hbase but probably not what you want for heavy lifting.
> >
> > Also, let me clear up one possible misunderstanding: Thrift does provide a traditional
socket transport.
> >
> > The key thing is that Thrift provides a nice object-based interface on top of that
socket transport and it will generate client bindings for a whole host of languages (C++,
Java, Ruby, PHP, Python, Perl, Erlang, Haskell, Ocaml so far). This will allow better programming
abstractions for clients in those languages - they will work with native objects and Thrift
will handle all the marshalling and unmarshalling and the details of the transport mechanism.
> >
> > So basically, I think I am supporting the thinking behind the work that Edward has
done on the socket server - I am just suggesting that the implementation would be more full-featured,
support more languages, and be developer-friendly if we do it using Thrift. I also don't think
it is a tremendous amount of work - like I said the heavy lifting is probably in designing
the APIs (Bryan, thanks for setting up the Wiki page for that).
> >
> > The current notion that we are kicking around is to wrap the Java Hbase client in
a server that exposes Thrift interfaces to do all the things the Hbase client can do. That
becomes a gateway that can be communicated with over standard sockets by making use of the
Thrift client bindings. The reason I think this is the way to go, at least for now, is because
I expect the Java Hbase client will become more and more full-featured soon (various kinds
of caching, scanner read-ahead buffers, etc.) and I think we should avoid having to implement
those features in multiple languages until the project becomes a lot more mature. Also, if
there are multiple Hbase client processes on a single machine, the gateway will allow any
caching or buffering to be shared across those processes. Eventually, all the Hbase RPC could
be converted over to Thrift and then those who really wanted to could port the Hbase client
to other languages - although I'd recommend that we hold off on that for quite some time.
> >
> > It seems to me that the HQL issue is actually orthogonal to this one. I think there
is room for an RPC interface that executes direct Hbase calls and one that allows for executing
HQL. HQL also provides a nice implementation-independent compatibility mechanism for other
Hbase-like systems - for example, we have talked with the Hypertable folks and they are planning
to adopt HQL syntax as well. We probably need to build some kind of standard around HQL as
well.
> >
> > WRT to Bryan concerns about fresh client libraries for each language, I think the
gateway notion can take care of that: the HQL translation into lower-level Hbase commands
can simply be implemented there, either inside the Java Hbase client or as an add-on jar.
> >
> > I do share Bryan's concerns about HQL in terms of whether it truly exploits the
full parallelism of Hbase, especially if one is expecting to issue a single query and return
data from across the entire key space. Perhaps I am missing something but I think this area
needs a little more exploration. I'll try to put together some thoughts on this if I get the
time.
> >
> > Chad
> >
> >
>
> > On 12/8/07 9:43 AM, "Bryan Duxbury"  wrote:
> >
> > Except, there is NO traditional interface for HBase. We have the
> > choice to build whatever interface we want.
> >
> > I think the fundamental difference between Thrift/REST and an HQL
> > socket server would be the TYPE of the interface. Thrift/REST mostly
> > matches the existing underlying API (Thrift more so than REST), but
> > HQL requires us to develop and maintain a whole SQL-like syntax, and
> > to redefine our operations in terms of SQL, and figure out good ways
> > to manage bulk of data that can be returned, and it wouldn't even be
> > aligned with any known standard, so completely fresh client libraries
> > for every language. It just seems like a lot more effort for what
> > results in a more complex interface than we get with our other efforts.
> >
> > On Dec 8, 2007, at 3:36 AM, edward yoon wrote:
> >
> >>
> >> My notebook have both USB port and PS/2 port.
> >> But, the maker didn't say PS/2 port is a unnecessary thing.
> >>
> >> Premature withdrawal of traditional interface will guarantee failure.
> >>
> >> Thanks,
> >> Edward.
> >> ------------------------------
> >> B. Regards,
> >>
> >> Edward yoon @ NHN, corp.
> >> Home : http://www.udanax.org
> >>
> >>> From: chad@powerset.com
> >>> To: hadoop-user@lucene.apache.org
> >>> Date: Fri, 7 Dec 2007 22:44:51 -0800
> >>> Subject: Re: Talking to HBase via tcp/socket
> >>>
> >>>
> >>> The heavy lifting in this exercise is mainly in designing the RPC
> >>> calls themselves - after that, it is probably a simple matter of
> >>> programming.
> >>>
> >>> Anyone want to take a crack at it?
> >>>
> >>> Chad
> >>>
> >>>
> >>> On 12/7/07 11:52 AM, "Bryan Duxbury" wrote:
> >>>
> >>> There's nothing stopping us from creating REST "methods" for
> >>> creating/
> >>> deleting tables. That's mostly a question of whether or not we want
> >>> to expose the functionality elsewhere than the shell. You could
> >>> create a ticket for that and we can discuss it.
> >>>
> >>> I agree that XML can be heavy, which is why we are implementing the
> >>> ability to use the "Accept: multipart/related" header to get back the
> >>> data as pure binary with boundaries. This should alleviate the
> >>> overhead of using XML for the most part.
> >>>
> >>> Hey, I hardly know Java, and I'm hacking all sorts of stuff!
> >>> Seriously though, I think that as far as performant cross-platform
> >>> access goes, the future is a Thrift servlet. I don't have a timeline
> >>> on that at all yet.
> >>>
> >>> -Bryan
> >>>
> >>> On Dec 7, 2007, at 11:44 AM, Thiago Jackiw wrote:
> >>>
> >>>> The are a few reasons why I wanted to go with Socket instead of
> >>>> REST,
> >>>> to name a couple:
> >>>>
> >>>> - By applying Edward's patch I was able to gain access to the
> >>>> 'entire'
> >>>> HBase interface, from creating to deleting tables, etc, which I
> >>>> couldn't do with REST. Is this flexibility something sought for
> >>>> future
> >>>> development?
> >>>> - Performance gain. Working with xml can sometimes be problematic
> >>>> and 'heavy'.
> >>>>
> >>>>> I would suggest exploring building a Thrift servlet that mimics
> >>>>> the structure of the REST servlet
> >>>> That could work if I knew Java :P
> >>>>
> >>>> Anyhow, despite HBase being pretty new, it sure kicks ass. Kudos to
> >>>> you guys.
> >>>>
> >>>> --
> >>>> Thiago
> >>>>
> >>>>
> >>>> On Dec 7, 2007 10:42 AM, Bryan Duxbury wrote:
> >>>>> What's the motivation for using straight a straight TCP socket
> >>>>> rather
> >>>>> than REST? The motivation behind producing a REST interface in the
> >>>>> first place is that since the client still lives in Java, then
> >>>>> we get
> >>>>> to take advantage of all the built-in Java client work that's been
> >>>>> done. If you're looking for a more lightweight way to interact with
> >>>>> HBase (since REST can be a little heavy at times), then rather than
> >>>>> go the HQL route, I would suggest exploring building a Thrift
> >>>>> servlet
> >>>>> that mimics the structure of the REST servlet. This is something
> >>>>> that's been discussed as a next step for HBase interoperability.
> >>>>>
> >>>>> -Bryan
> >>>>>
> >>>>>
> >>>>> On Dec 6, 2007, at 8:25 PM, Thiago Jackiw wrote:
> >>>>>
> >>>>>> Is there a way to interact with HBase via TCP/socket connection
> >>>>>> directly instead of just using the REST api?
> >>>>>>
> >>>>>> Thanks
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>
> >> _________________________________________________________________
> >> You keep typing, we keep giving. Download Messenger and join the
> >> i'm Initiative now.
> >> http://im.live.com/messenger/im/home/?source=TAGLM
> >
> >
> >
>
> _________________________________________________________________
> You keep typing, we keep giving. Download Messenger and join the i'm Initiative now.
> http://im.live.com/messenger/im/home/?source=TAGLM

Mime
View raw message