hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rose, Joseph" <Joseph.R...@childrens.harvard.edu>
Subject Re: Status of Huawei's 2' Indexing?
Date Mon, 16 Mar 2015 18:51:20 GMT
Thanks, Wilm. I’ll look for the thread there.

Obviously I didn’t realize there was so much back story: I was asking
about this specific implementation because it seems to be fairly well
thought out and have good commentary in the Jira ticket (HBASE-9203). At
the time I thought it was mostly a dev concern. I think we’ve moved on, as
you pointed out.

I'd be happy to contribute to hbase if I have something to offer. I’m just
starting with this, so let’s see where it takes us.

For those of you joining us late, you can find the continuation here:
http://mail-archives.apache.org/mod_mbox/hbase-user/201503.mbox/%3C550722DA
.3040009%40gmail.com%3E


-j


On 3/16/15, 2:09 PM, "Wilm Schumacher" <wilm.schumacher@gmail.com> wrote:

>Hi Joseph,
>
>I think that you kicked off this discussion, because to implement an
>indexing mechanism for hbase in general is much more complicate than
>your specific problem. The people on this list want to bear every
>possible (or at least A LOT) of applications in mind. A too easy
>mechanism wouldn't fit the needs of most of the users (thus would be
>useless), a more complicate model is harder to maintain and you would
>have to find more coders etc.. Thus with your application question you
>seemed to walked right into a very general discussion.
>
>Furthermore this is a user question, as you do not want to change the
>code of hbase, aren't you ;). I'll try an answer on the general user
>list in a couple of minutes, thus more people can discuss and we can get
>traffic out of this list, okay?
>
>Best wishes
>
>Wilm
>
>Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
>> Alright, let’s see if I can get this discussion back on track.
>>
>> I have a sensibly defined table for patient data; its rowkey is simply
>> lastname:firstname, since it’s convenient for the bulk of my lookups.
>> Unfortunately I also need to efficiently find patients using an ID
>>string,
>> whose literal value is buried in a value field. I’m sure this situation
>>is
>> not foreign to the people on this list.
>>
>> It’s been suggested that I implement 2’ indexes myself — fine. All the
>> research I’ve done seems to end with that suggestion, with the exception
>> of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which
>>seems
>> to incite some discussion here). I’m happy to put this together but I’d
>> rather go with something that has been vetted and has a larger developer
>> community than one (i.e., ME). Besides, I have a full enough plate at
>>the
>> moment that I’d rather not have to do this, too.
>>
>> Are there constructive suggestions regarding how I can proceed with
>>HBase?
>> Right now even a well-vetted local index would be a godsend.
>>
>> Thanks.
>>
>>
>> -j
>>
>>
>> p.s., I’ll refer you to this post for a slightly more detailed rundown
>>of
>> how I plan to do things:
>> 
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__article.gmane.org_gma
>>ne.comp.java.hadoop.hbase.user_46467&d=BQIDaQ&c=qS4goWBT7poplM69zy_3xhKwE
>>W14JZMSdioCoppxeFU&r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaV
>>uQHxlqAccDLc&m=NwQpAjAe0QcCDK7Dp0galpRYD3IvcpoK3xijbLf1WFo&s=lBW_VCH7IruB
>>tyg3PhTjU_CW2-po9IFfiIYNMpglIRk&e=
>>
>>
>> On 3/16/15, 12:18 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:
>>
>>> Joseph, 
>>>
>>> The issue with Andrew goes back a few years.  His comment about having
>>>a
>>> civilized discussion was a personal dig at me.
>>>
>>>
>>>> On Mar 16, 2015, at 10:38 AM, Rose, Joseph
>>>> <Joseph.Rose@childrens.harvard.edu> wrote:
>>>>
>>>> Michael,
>>>>
>>>> I don’t understand the invective. I’m sure you have something to
>>>> contribute but when bring on this tone the only thing I hear are the
>>>> snide
>>>> comments.
>>>>
>>>>
>>>> -j
>>>>
>>>>
>>>> P.s., I’ll refer you to this:
>>>> 
>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_b
>>>>oo
>>>> 
>>>>k.html-23-5Fjoins&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeF
>>>>U&
>>>> 
>>>>r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc&m=uj
>>>>JC
>>>> 
>>>>fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLs&s=2TGF0r5VvzExMqV31LmI3rQd4B8eJ
>>>>q_
>>>> PqYKJXUqAjNk&e=
>>>>
>>>>
>>>> On 3/16/15, 11:15 AM, "Michael Segel" <michael_segel@hotmail.com>
>>>>wrote:
>>>>
>>>>> You’ll have to excuse Andy.
>>>>>
>>>>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And
>>>>>it
>>>>> was trivial. Just got done last month….
>>>>>
>>>>> But I digress… The long story short…
>>>>>
>>>>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
>>>>> on
>>>>> the region which had two problems.
>>>>> 1) Complexity in that they wanted to keep the index on the same
>>>>>region
>>>>> server
>>>>> 2) Joins become impossible.  Well, actually not impossible, but
>>>>> incredibly slow when compared to the alternative.
>>>>>
>>>>> You really should go back to the email chain.
>>>>> Their defense (including Salesforce who was going to push this
>>>>> approach)
>>>>> fell apart when you asked the simple question on how do you handle
>>>>> joins?
>>>>>
>>>>> That’s their OOPS moment. Once you start to understand that, then
>>>>> allowing the index to be orthogonal to the base table, things started
>>>>> to
>>>>> come together.
>>>>>
>>>>> In short, you have a query either against a single table, or if
>>>>>you’re
>>>>> doing a join.  You then get the indexes and assuming that you’re only
>>>>> using the AND predicate, its a simple intersection of the index
>>>>>result
>>>>> sets. (Since the result sets are ordered, its relatively trivial to
>>>>> walk
>>>>> through and find the intersections of N Lists in a single pass.)
>>>>>
>>>>>
>>>>> Now you have your result set of base table row keys and you can work
>>>>> with
>>>>> that data. (Either returning the records to the client, or as input
>>>>>to
>>>>> a
>>>>> map/reduce job.
>>>>>
>>>>> That’s the 30K view.  There’s more to it, but once Salesforce got
the
>>>>> basic idea, they ran with it. It was really that simple concept that
>>>>> the
>>>>> index would be orthogonal to the base table that got them moving in
>>>>>the
>>>>> right direction.
>>>>>
>>>>>
>>>>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
>>>>> However,
>>>>> it seems that some of the Committers are suffering from rectal
>>>>>induced
>>>>> hypoxia. HBASE-12853 was created not just to help solve the issue of
>>>>> ‘hot
>>>>> spotting’ but also to get the Committers to focus on bringing the
>>>>> solutions that they glum on in the client, back to the server side of
>>>>> things. 
>>>>>
>>>>> Unfortunately the last great attempt at fixing things on the server
>>>>> side
>>>>> was the bastardization of coprocessors which again, suffers from the
>>>>> lack
>>>>> of thought.  This isn’t to say that allowing users to extend the
>>>>>server
>>>>> side functionality is wrong. (Because it isn’t.) But that the
>>>>> implementation done in HBase is a tad lacking in thought.
>>>>>
>>>>> So in terms of indexing…
>>>>> Longer term picture, there has to be some fixes on the server side of
>>>>> things to allow one to associate an index (allowing for different
>>>>> types)
>>>>> to a base table, yet the implementation of using the index would end
>>>>>up
>>>>> becoming a client.  And by client, it would be an external query
>>>>>engine
>>>>> processor that could/should sit on the cluster.
>>>>>
>>>>> But hey! What do I know?
>>>>> I gave up trying to have an intelligent/civilized conversation with
>>>>> Andrew because he just couldn’t grasp the basics.  ;-)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <apurtell@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> When I made that remark I was thinking of a recent discussion we
had
>>>>>> at
>>>>>> a
>>>>>> joint Phoenix and HBase developer meetup. The difference of opinion
>>>>>> was
>>>>>> certainly civilized. (smile) I'm not aware of any specific written
>>>>>> discussion, it may or may not exist. I'm pretty sure a revival of
>>>>>> HBASE-9203
>>>>>> would attract some controversy, but let me be clearer this time
>>>>>>than I
>>>>>> was
>>>>>> before that this is just my opinion, FWIW.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>>>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>>>>
>>>>>>> I saw that it was added to their project. I’m really not keen
on
>>>>>>> bringing
>>>>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow
>>>>>>> other
>>>>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>>>>>
>>>>>>> That Phoenix article seems like a good breakdown of the various
>>>>>>> indexing
>>>>>>> architectures.
>>>>>>>
>>>>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty
>>>>>>> civilized
>>>>>>> (as
>>>>>>> are most of them, it seems) so I didn’t know there were these
>>>>>>> differences
>>>>>>> of opinion. Did I miss the mailing list thread where the
>>>>>>> architectural
>>>>>>> differences were discussed?
>>>>>>>
>>>>>>>
>>>>>>> -j
>>>>> The opinions expressed here are mine, while they may reflect a
>>>>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>> The opinions expressed here are mine, while they may reflect a
>>>cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>>
>>>
>>>
>>>
>>>
>

Mime
View raw message