Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DD6311560 for ; Sun, 11 May 2014 04:22:38 +0000 (UTC) Received: (qmail 1347 invoked by uid 500); 11 May 2014 04:22:38 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 1290 invoked by uid 500); 11 May 2014 04:22:37 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 1282 invoked by uid 99); 11 May 2014 04:22:37 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 May 2014 04:22:37 +0000 Received: from localhost (HELO mail-yh0-f43.google.com) (127.0.0.1) (smtp-auth username jamestaylor, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 May 2014 04:22:37 +0000 Received: by mail-yh0-f43.google.com with SMTP id v1so603779yhn.30 for ; Sat, 10 May 2014 21:22:36 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.236.176.166 with SMTP id b26mr14230320yhm.34.1399782156699; Sat, 10 May 2014 21:22:36 -0700 (PDT) Received: by 10.170.195.196 with HTTP; Sat, 10 May 2014 21:22:36 -0700 (PDT) In-Reply-To: References: <535FBB18.2070902@gmail.com> <535FF602.30902@gmail.com> <536118B1.902@gmail.com> <536403AD.8070308@gmail.com> Date: Sat, 10 May 2014 21:22:36 -0700 Message-ID: Subject: Re: SQL layer over Accumulo? From: James Taylor To: dev@accumulo.apache.org Content-Type: multipart/alternative; boundary=20cf303f6ab636b36f04f9182e46 --20cf303f6ab636b36f04f9182e46 Content-Type: text/plain; charset=UTF-8 @William - it's entirely possible that my HBase terminology is not mapping well to Accumulo terminology. If Accumulo has a capability not present in HBase that'll handle this, that'd be great. In HBase terminology, by row I mean all of the key values across all column families with the same row key (Row ID in Accumulo?). So in HBase, it doesn't work to store the index data in a separate column family for the same row, because the rows are ordered according to the data table row key. We need the rows of an index to be ordered by the row key formed by the indexed columns instead. Otherwise we have to re-sort the rows which is more expensive than just doing a scan over the data table. With buddy regions, the two regions are from different tables with different row key orders. All of the data from "D" for a given region is contained in the buddy region for "I", but in a different order. We equally rely on the buddy region for "I" being in row key order according to the indexed columns (as opposed to the row key order of the data table). Thanks, James On Sat, May 10, 2014 at 7:21 PM, William Slacum < wilhelm.von.cloud@accumulo.net> wrote: > So there may be a bit of confusion with storing index and data in the same > row. By "row" I just mean the logical Accumulo unit, not a "row" as in > "thing in my relational table." Synonyms for "row" in this scheme are > "shard" and "document partition". > > You can store multiple documents and indices for those documents in > different column families within the same row. You then have separate > readers for the indices and document data ("sources" in Iterator terms). > Point and range queries are still possible in this fashion, and are made > even easier if there's another level that maps terms to > rows/shards/partition. The wikisearch example is an (admittedly rough) > implementation of this. > > I think looking at how "buddy" regions work may help clarify things, since > I imagine it works similarly. If the coprocessor is just reading from a > region "I", that that contains index data for only region "D", then that > maps pretty well to an iterator scanning index data from a column family > "I" and fetching documents from a column family "D". > > > > On Thu, May 8, 2014 at 1:09 AM, James Taylor > wrote: > > > Sorry for the delay in getting back to you - things got a bit crazy with > > our graduation and HBaseCon happening at the same time. > > > > @Josh & Bill - r.e. Co-locating indices within the same row simplifies > this > > a bit. > > The secondary indexes need to be in row key order by the indexed columns, > > so co-locating them in the data table wouldn't allow the lookup and range > > scan abilities we'd need. The advantage of the index is that you don't > need > > to look at all the data, but can do a point lookup or range scan based on > > the usage of the indexed columns in a query. > > > > @Josh - r.e. Assuming I understand properly, you don't need to be > cognizant > > of the splits. You just specify the Ranges (where each Range is a start > key > > and end key) and the Accumulo client API does the rest. > > > > Typically the Ranges are merge sorted on the client, so this might > require > > an extension to the Accumulo client. > > > > r.e. Next steps. > > > > We'd definitely need an expert on the Accumulo side to proceed. I'm happy > > to help on the Phoenix side - I'll post a note on our dev list too to see > > if there are other folks interested as well. Given the similarities > between > > Accumulo and HBase and the abstraction Phoenix already has in place, I > > don't think the effort would be large to get something up and running. > > Maybe a phased approach, would make sense: first with query support and > > next with secondary index support? > > > > Not sure where this stacks up in terms of priority for you all. At > > Salesforce, we saw a specific need for this with HBase, the "big data > > store" on top of which we'd choose to standardize. We realized early on > > that we'd never get the adoption we wanted without providing a different, > > more familiar programming model: namely SQL. Since we were targeting > > supporting interactive web-based applications, anything map/reduce based > > wasn't a fit which led us to create Phoenix. Perhaps there are members in > > your community in the same boat? > > > > Thanks, > > James > > > > > > > > On Fri, May 2, 2014 at 1:44 PM, Josh Elser wrote: > > > > > On 5/1/14, 2:24 AM, James Taylor wrote: > > > > > >> Thanks for the explanations, Josh. This sounds very doable. Few more > > >> comments inline below. > > >> > > >> James > > >> > > >> > > >> On Wed, Apr 30, 2014 at 8:37 AM, Josh Elser > > wrote: > > >> > > >> > > >>> > > >>> On 4/30/14, 3:33 AM, James Taylor wrote: > > >>> > > >>> On Tue, Apr 29, 2014 at 11:57 AM, Josh Elser > > >>>> wrote: > > >>>> > > >>>> @Josh - it's less baked in than you'd think on the client where > the > > >>>> query > > >>>> > > >>>>> > > >>>>> parsing, compilation, optimization, and orchestration occurs. The > > >>>>>> client/server interaction is hidden behind the > > ConnectionQueryServices > > >>>>>> interface, the scanning behind ResultIterator (in > > >>>>>> particular ScanningResultIterator), the DML behind MutationState, > > and > > >>>>>> KeyValue interaction behind KeyValueBuilder. Yes, though, it would > > >>>>>> require > > >>>>>> some more abstraction, but probably not too bad, though. On the > > >>>>>> server-side, the entry points would all be different and that's > > where > > >>>>>> I'd > > >>>>>> need your insights for what's possible. > > >>>>>> > > >>>>>> > > >>>>>> Definitely. I'm a little concerned about what's expected to be > > >>>>> provided > > >>>>> by > > >>>>> the "database" (HBase, Accumulo) as I believe HBase is a little > more > > >>>>> flexible in allowing writes internally where Accumulo has thus far > > said > > >>>>> "you're gonna have a bad time". > > >>>>> > > >>>>> > > >>>> > > >>>> Tell me more about what you mean by "allowing writes internally". > > >>>> > > >>>> > > >>> Haha, sorry, that was a sufficiently ominous statement with > > insufficient > > >>> context. > > >>> > > >>> For discussion sake, let's just say HBase coprocessors and Accumulo > > >>> iterators are equivalent, purely in the scope of "running server-side > > >>> code" > > >>> (in the RegionServer/TabletServer). However, there is a notable > > >>> difference > > >>> in the pipeline where each of those are implemented. > > >>> > > >>> Coprocessors have built-in hooks that let you get updates on > > >>> PUT/GET/DELETE/etc as well as pre and post each of those operations. > In > > >>> other words, they provide hooks at a "high database level". > > >>> > > >>> Iterators tend to be much closer to the data itself, only dealing > with > > >>> streams of data (other iterators stacked on one another). Iterators > > >>> implement versioning, visibilities, and can even implement complex > > >>> searches. The downside of this approach is that iterators lack any > > means > > >>> to > > >>> safely write data _outside of the sorted Key-Value pairs in the > tablet > > >>> currently being processed_. It's possible to make in tablet updates, > > but > > >>> sorted order within a large tablet might make this difficult as well. > > >>> > > >>> This is why I was thinking percolator would be a better solution, as > > it's > > >>> meant for handling updates like this server-side. However, I imagine > it > > >>> would be possible, in the short-term, to make some separate process > > >>> between > > >>> Phoenix and Accumulo which handles writes. > > >>> > > >> > > >> > > >> Another fallback might be to do global index maintenance on the > client. > > >> It'd just be more expensive, especially if you want to handle > > out-of-order > > >> updates (which are particularly tricky, as you have to get multiple > > >> versions of the rows to work out all the different scenarios here). > > >> > > >> A second fallback might be to support only local indexing. Does > Accumulo > > >> have the concept of a "custom load balancer" that would allow you to > > >> co-locate two regions from different tables? The local-index features > > has > > >> kind of driven some feature requests on that front for HBase - mainly > > >> callbacks when a region is split or re-located. The rows of the local > > >> index > > >> are prefixed with the region start key to keep them together and > > identify > > >> them. > > >> > > > > > > Agreed with what Bill said. Co-locating indices within the same row > > > simplifies this a bit, IMO. > > > > > > > > > > > > > > > > > > > > >>>>> > > >>>> > > >>>> There's not a lot of hard/fast requirements. Most of what Phoenix > does > > >>>> is > > >>>> to optimize performance by leveraging the capabilities of the > server. > > In > > >>>> terms of hard/fast requirements, these come to mind: > > >>>> - data is returned in row key order from range scans > > >>>> - a scan may set a start key/stop key to do a range scan > > >>>> - a row key may be composed of arbitrary bytes > > >>>> - a client may "pre-split" a table by providing the region > boundaries > > at > > >>>> table create time (we rely on this for salting to prevent > hotspotting: > > >>>> http://phoenix.incubator.apache.org/salted.html). > > >>>> - the client has access to the region boundaries of a table (this > > allows > > >>>> for better parallelization) > > >>>> - the client may issue chunk up a scan into smaller, multiple scans > > and > > >>>> run > > >>>> them in parallel > > >>>> Some of these may be a bit squishy, as there may be existing > machinery > > >>>> already in your client programming model that could be leverage. The > > >>>> client > > >>>> API of HBase, for example, does not provide the ability out of the > box > > >>>> to > > >>>> parallelize a scan, so this is something Phoenix had to add on top > > >>>> (through > > >>>> chunking up scans at or within region boundaries). > > >>>> > > >>>> > > >>> All of these look fine. The Accumulo BatchScanner does that > > >>> parallelization for you which is really nice (handling tablet > migration > > >>> and > > >>> all that fun stuff transparently). > > >>> > > >> > > >> > > >> That's nice that Accumulo has this built-in. Does it allow the client > to > > >> specify the split points for the scan in some way? > > >> > > > > > > Assuming I understand properly, you don't need to be cognizant of the > > > splits. You just specify the Ranges (where each Range is a start key > and > > > end key) and the Accumulo client API does the rest. You can be > efficient > > by > > > structuring your data so that you don't touch every tabletserver for > > every > > > query -- this seems to be what's being suggested. > > > > > > > > > > > > What do you think is next, James? > > > > > > I know I won't have a lot of time to devote into heavy development with > > > what I've already signed up for in the next few months, but I'd still > > like > > > to try to help out where possible. Is anyone else on the Accumulo side > > > interested in getting involved? > > > > > > --20cf303f6ab636b36f04f9182e46--