accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: SQL layer over Accumulo?
Date Tue, 29 Apr 2014 18:41:55 GMT
@Mike - thanks for pointing out that JIRA. I'll comment there with more
detail. My high-level thinking would be to work with your community to do a
feasibility study and perhaps POC. I'd be, of course, relying on your
expertise of Accumulo, as my knowledge is pretty limited.

@Jeremy - take a look at the prior presentations to get a better idea:
http://phoenix.incubator.apache.org/resources.html. In particular, take a
look at the ApacheCon presentation and the kinds of pushdown we do to the
server.

@Josh - it's less baked in than you'd think on the client where the query
parsing, compilation, optimization, and orchestration occurs. The
client/server interaction is hidden behind the ConnectionQueryServices
interface, the scanning behind ResultIterator (in
particular ScanningResultIterator), the DML behind MutationState, and
KeyValue interaction behind KeyValueBuilder. Yes, though, it would require
some more abstraction, but probably not too bad, though. On the
server-side, the entry points would all be different and that's where I'd
need your insights for what's possible.

@Donald - you make a good point. We've stretched the capabilities of SQL,
especially around DDL to support views (
http://phoenix.incubator.apache.org/views.html) which allow you to add new
columns, and read-time schema (
http://phoenix.incubator.apache.org/dynamic_columns.html) which allow you
to specify column definitions at read/write time.

@Eric - I agree about having txn support (probably through snapshot
isolation) by controlling the timestamp, and then layering indexing on top
of that. That's where we're headed. But I wouldn't let that stop the effort
- it would just be layered on top of what's already there. FWIW, there's
another interesting indexing model that has been termed "local indexing"(
https://github.com/Huawei-Hadoop/hindex) which is being worked on right now
(should be available in either our 4.1 or 4.2 release). In this model, the
table data and index data are co-located on the same region server through
a kind of "buddy" region mechanism. The advantage is that you take no hit
at write time, as you're writing both the index and table data together.
Not sure how/if this would transfer over to the Accumulo world.

TLDR? Let's continue in the JIRA?

Thanks,
James



On Tue, Apr 29, 2014 at 7:45 AM, Josh Elser <josh.elser@gmail.com> wrote:

> James,
>
> Thanks for reaching out.
>
> Like Eric said, I'm a little scared because I know that Phoenix is rather
> baked into HBase's API. But, that's half the fun in writing some new code :)
>
> I'd be happy to help evaluate what this would look like - what is
> different (both good and bad) in Accumulo. Like was previously mentioned,
> targeting the Accismus (Percolator) prototype to generate the secondary
> indices would, IMO, be the best target here. I know it's in the very early
> stages right now, but I still believe that it would be the long-term
> solution.
>
> - Josh
>
>
> On 4/29/14, 1:32 AM, James Taylor wrote:
>
>> Hello,
>> Would there be any interest in developing a SQL-layer on top of Accumulo?
>> I'm part of the Apache Phoenix project and we've built a similar system on
>> top of HBase. I wanted to see if there'd be interest on your end at
>> working
>> with us to generalizing our client and provide in a server that would do
>> Accumulo-specific push down in support of a SQL layer. I suspect there's
>> enough similarity between HBase and Accumulo that this would be feasible.
>> Thanks,
>> James
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message