accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: SQL layer over Accumulo?
Date Tue, 29 Apr 2014 18:57:06 GMT
> @Josh - it's less baked in than you'd think on the client where the query
> parsing, compilation, optimization, and orchestration occurs. The
> client/server interaction is hidden behind the ConnectionQueryServices
> interface, the scanning behind ResultIterator (in
> particular ScanningResultIterator), the DML behind MutationState, and
> KeyValue interaction behind KeyValueBuilder. Yes, though, it would require
> some more abstraction, but probably not too bad, though. On the
> server-side, the entry points would all be different and that's where I'd
> need your insights for what's possible.

Definitely. I'm a little concerned about what's expected to be provided 
by the "database" (HBase, Accumulo) as I believe HBase is a little more 
flexible in allowing writes internally where Accumulo has thus far said 
"you're gonna have a bad time".

> @Eric - I agree about having txn support (probably through snapshot
> isolation) by controlling the timestamp, and then layering indexing on top
> of that. That's where we're headed. But I wouldn't let that stop the effort
> - it would just be layered on top of what's already there. FWIW, there's
> another interesting indexing model that has been termed "local indexing"(
> which is being worked on right now
> (should be available in either our 4.1 or 4.2 release). In this model, the
> table data and index data are co-located on the same region server through
> a kind of "buddy" region mechanism. The advantage is that you take no hit
> at write time, as you're writing both the index and table data together.
> Not sure how/if this would transfer over to the Accumulo world.

Interesting. Given that Accumulo doesn't have a fixed column family 
schema, this might make index generation even easier (maybe "cleaner" is 
the proper word). You could easily co-locate the indices with the data, 
given them a proper name.

Problem still exists that we don't have a solid way to do this solely 
inside of Accumulo ATM. I'd imagine that if someone stepped up to 
implement coprocessors, we'd be taking the route of a separate, 
standalone process (as opposed to in-RegionServer). Hypothetically, we 
could do the same for Phoenix in the short-term.

Can you quantify what would be expected by Accumulo to integrate with 
Phoenix (maybe list what exactly is done inside of HBase at a high 
level?) so that we could give some more targeted ideas/feelings as to 
what the level of work would be inside Accumulo?

> TLDR? Let's continue in the JIRA?

Mailing list is fine by me for while we get this hashed out :). We can 
move to Jira when we start getting into specifics.

View raw message