hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rose, Joseph" <Joseph.R...@childrens.harvard.edu>
Subject Re: Status of Huawei's 2' Indexing?
Date Mon, 16 Mar 2015 17:46:17 GMT
Alright, let’s see if I can get this discussion back on track.

I have a sensibly defined table for patient data; its rowkey is simply
lastname:firstname, since it’s convenient for the bulk of my lookups.
Unfortunately I also need to efficiently find patients using an ID string,
whose literal value is buried in a value field. I’m sure this situation is
not foreign to the people on this list.

It’s been suggested that I implement 2’ indexes myself — fine. All the
research I’ve done seems to end with that suggestion, with the exception
of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
to incite some discussion here). I’m happy to put this together but I’d
rather go with something that has been vetted and has a larger developer
community than one (i.e., ME). Besides, I have a full enough plate at the
moment that I’d rather not have to do this, too.

Are there constructive suggestions regarding how I can proceed with HBase?
Right now even a well-vetted local index would be a godsend.

Thanks.


-j


p.s., I’ll refer you to this post for a slightly more detailed rundown of
how I plan to do things:
http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467


On 3/16/15, 12:18 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:

>Joseph, 
>
>The issue with Andrew goes back a few years.  His comment about having a
>civilized discussion was a personal dig at me.
>
>
>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>><Joseph.Rose@childrens.harvard.edu> wrote:
>> 
>> Michael,
>> 
>> I don’t understand the invective. I’m sure you have something to
>> contribute but when bring on this tone the only thing I hear are the
>>snide
>> comments.
>> 
>> 
>> -j
>> 
>> 
>> P.s., I’ll refer you to this:
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
>>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
>>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=ujJC
>>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
>>PqYKJXUqAjNk&e= 
>> 
>> 
>> On 3/16/15, 11:15 AM, "Michael Segel" <michael_segel@hotmail.com> wrote:
>> 
>>> You’ll have to excuse Andy.
>>> 
>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>>> was trivial. Just got done last month….
>>> 
>>> But I digress… The long story short…
>>> 
>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>on
>>> the region which had two problems.
>>> 1) Complexity in that they wanted to keep the index on the same region
>>> server
>>> 2) Joins become impossible.  Well, actually not impossible, but
>>> incredibly slow when compared to the alternative.
>>> 
>>> You really should go back to the email chain.
>>> Their defense (including Salesforce who was going to push this
>>>approach)
>>> fell apart when you asked the simple question on how do you handle
>>>joins?
>>> 
>>> That’s their OOPS moment. Once you start to understand that, then
>>> allowing the index to be orthogonal to the base table, things started
>>>to
>>> come together. 
>>> 
>>> In short, you have a query either against a single table, or if you’re
>>> doing a join.  You then get the indexes and assuming that you’re only
>>> using the AND predicate, its a simple intersection of the index result
>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>walk
>>> through and find the intersections of N Lists in a single pass.)
>>> 
>>> 
>>> Now you have your result set of base table row keys and you can work
>>>with
>>> that data. (Either returning the records to the client, or as input to
>>>a
>>> map/reduce job.
>>> 
>>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>>> basic idea, they ran with it. It was really that simple concept that
>>>the
>>> index would be orthogonal to the base table that got them moving in the
>>> right direction.
>>> 
>>> 
>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>However,
>>> it seems that some of the Committers are suffering from rectal induced
>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>‘hot
>>> spotting’ but also to get the Committers to focus on bringing the
>>> solutions that they glum on in the client, back to the server side of
>>> things. 
>>> 
>>> Unfortunately the last great attempt at fixing things on the server
>>>side
>>> was the bastardization of coprocessors which again, suffers from the
>>>lack
>>> of thought.  This isn’t to say that allowing users to extend the server
>>> side functionality is wrong. (Because it isn’t.) But that the
>>> implementation done in HBase is a tad lacking in thought.
>>> 
>>> So in terms of indexing…
>>> Longer term picture, there has to be some fixes on the server side of
>>> things to allow one to associate an index (allowing for different
>>>types)
>>> to a base table, yet the implementation of using the index would end up
>>> becoming a client.  And by client, it would be an external query engine
>>> processor that could/should sit on the cluster.
>>> 
>>> But hey! What do I know?
>>> I gave up trying to have an intelligent/civilized conversation with
>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <apurtell@apache.org>
>>>>wrote:
>>>> 
>>>> When I made that remark I was thinking of a recent discussion we had
>>>>at
>>>> a
>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>was
>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>> HBASE-9203
>>>> would attract some controversy, but let me be clearer this time than I
>>>> was
>>>> before that this is just my opinion, FWIW.
>>>> 
>>>> 
>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>> 
>>>>> I saw that it was added to their project. I’m really not keen on
>>>>> bringing
>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>> other
>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>> 
>>>>> That Phoenix article seems like a good breakdown of the various
>>>>> indexing
>>>>> architectures.
>>>>> 
>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>civilized
>>>>> (as
>>>>> are most of them, it seems) so I didn’t know there were these
>>>>> differences
>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>architectural
>>>>> differences were discussed?
>>>>> 
>>>>> 
>>>>> -j
>>> 
>>> The opinions expressed here are mine, while they may reflect a
>>>cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>
>The opinions expressed here are mine, while they may reflect a cognitive
>thought, that is purely accidental.
>Use at your own risk.
>Michael Segel
>michael_segel (AT) hotmail.com
>
>
>
>
>

Mime
View raw message