drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@maprtech.com>
Subject Re: A light-weight, versioned client API for Drill
Date Thu, 21 Jul 2016 01:10:54 GMT
Hi Julian,

Thanks! This is the kind of suggestion I was looking for.

I did, in fact take a look at Avatica: Drill uses it for the existing JDBC driver. To be honest,
I was a bit concerned about the overhead of converting rows to/from JSON. Have you looked
at fitting a binary protocol under Avatica? Would sure be great to reuse the work already
done to handle the many JDBC complexities.

- Paul

> On Jul 20, 2016, at 1:39 PM, Julian Hyde <jhyde@apache.org> wrote:
> Did you consider Avatica? Identical goals, it works already, and there
> are clients in several languages.
> Julian
> On Wed, Jul 20, 2016 at 10:35 AM, Chunhui Shi <cshi@maprtech.com> wrote:
>> Cool. And we know that there are already many 'light weight' APIs soon
>> become the main stream APIs.
>> On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <progers@maprtech.com> wrote:
>>> Hi All,
>>> As I’ve been playing with and learning about Drill, it struck me that
>>> Drill is a wonderful “industrial strength” query engine, but that the
>>> client API is a bit complex if all an app wants to do is execute a few
>>> queries. I wondered if we need an adapter between the full-blown Drill
>>> columnar, asynchronous RPC that Drill uses internally, and the row-based,
>>> synchronous API that most apps know and love.
>>> In thinking about a simpler client API, a few items came to mind:
>>> - We have the JDBC API for Java apps, but the internals of the current
>>> JDBC use the Drill client and so the JDBC jar is quite big (20MB).
>>> - The current client API is not versioned, requiring clients to be
>>> upgraded in lock-step with servers. Many admins, however, find it necessary
>>> to upgrade clients on a schedule different from that of the server.
>>> (Imagine upgrading dozens of desktop users at the same time as the Drill
>>> cluster.) Many of the traditional DB products version their interferes to
>>> simplify this task.
>>> - A cool feature of Drill is schema-on-read, which means Drill may
>>> encounter different schemas as data is read. At present, it is a bit hard
>>> for clients to consume different schemas. It turns out, however, that
>>> stored procedures provide something similar (multiple result sets) that we
>>> could leverage that idea to make schema changes into a first-class feature
>>> of the API.
>>> Playing around a bit in my spare time, I found that we can grab lots of
>>> ideas from “traditional” DB APIs to solve the above problems (and more):
>>> - A simplified client API provides a row-based view of results, with
>>> schema changes as a first-class API concept.
>>> - A “direct" version of the client can sit directly on top of the Drill
>>> Client, much like the current JDBC driver.
>>> - Because the client API is simple, it is easy to create a new wire
>>> protocol to carry the required row-based client messages.
>>> - That wire protocol enables a very light-weight remote version of the
>>> client API.
>>> - A new server implements the server-side of the new wire protocol. The
>>> server is an adapter: it converts the “retail” row-based API into the
>>> “wholesale” columnar API of Drill.
>>> - A new JDBC implementation uses the remote API instead of directly using
>>> the Drill Client API.
>>> Because the remote client has no dependencies on Drill (or, indeed,
>>> anything other than the JDK), it is very small.  Indeed, the revised JDBC
>>> jar is about 1% of the size of the existing JDBC driver. (200KB instead of
>>> 20MB.)
>>> The result is a little prototype project called “Jig”. I’d like to toss
>>> out to the community to see if this is something of interest to others. The
>>> code works just well enough to prove the concept, though I’ve left off the
>>> more “advanced” data types, multiple cursors per connection, and other
>>> details.
>>> The advantage for Java users is a simpler API, smaller JDBC driver, fewer
>>> dependencies and cross-version compatibility.
>>> If we add clients in other languages, then just about any language can
>>> easily query Drill without a Java or ODBC bridge. This would be handy for
>>> that Caravel integration project discussed here a month or so back. Also
>>> for data scientists who prefer Python or R.
>>> In case there is interest in this idea, a more detailed proposal is
>>> available:
>>> https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM
>>> The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig
>>> The JIRA for this enhancement: DRILL-4791:
>>> https://issues.apache.org/jira/browse/DRILL-4791
>>> This has been a great little learning exercise. Is this something that
>>> might we might want to take further? Thoughts on the approach taken?
>>> Thanks,
>>> - Paul

View raw message