hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Status of Huawei's 2' Indexing?
Date Mon, 16 Mar 2015 17:16:44 GMT
Andrew, because 2+ years ago,  Phoenix wasn’t an Apache project. 

At the time, Huawei was releasing their research on it and Salesforce was implementing it.

I mention the company names because those were the parties involved in the work as well as
the discussion. Also those companies are mentioned in a lot of the earlier documentation.

What pretty much ended those conversations is when I asked “How do you handle table Joins?”.
And again since Phoenix was a Salesforce.com <http://salesforce.com/> project at the
time, his response was that Phoenix doesn’t do table joins.  (Which they supposedly do now…)

I would have gone further to mention Informix’s XPS Distributed Relational Database, however
the last time I talked about some of the lessons learned from the RDBMS advances done back
in the 90’s you seemed to have an issue with it.  Of course there we were talking about
coprocessors and I compared it to the extensibility done to RDBSs and what worked and what
didn’t.  The irony is that Mike Olson who was part of Illustra is now at Cloudera. (And
Informix eventually got it right)

Its very disappointing that this issue has been raised again. Once you talk about table Joins
the index is orthogonal to the base table and the argument becomes moot. 
Add to this using a different type of index, or allowing multiple indexes to the base table
and you now have the issue of column families all over again, but in spades. Again this makes
the Huawei’s idea unworkable.

It would even be pointless to try and hold a discussion on what should happen client side
and what should happen server side to support indexes. 

My suggestion is that when you think you have an answer, stop, go get a few drinks and spend
more time thinking about your answer. 


> On Mar 16, 2015, at 11:41 AM, Andrew Purtell <apurtell@apache.org> wrote:
> I don't understand the repeated mention of "Salesforce" in that invective.
> As point of fact the work of adding local mutable indexes to Phoenix was
> done by a contributor from Huawei, who has since moved over to Hortonworks,
> if I'm not mistaken - but not like affiliation matters, it really doesn't.
> As for the rest, well I've had to give up on your like and respect, but I
> picked up the pieces of my life a while back after we had that falling out
> over coprocessors.
> On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel <michael_segel@hotmail.com>
> wrote:
>> You’ll have to excuse Andy.
>> He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
>> was trivial. Just got done last month….
>> But I digress… The long story short…
>> HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
>> the region which had two problems.
>> 1) Complexity in that they wanted to keep the index on the same region
>> server
>> 2) Joins become impossible.  Well, actually not impossible, but incredibly
>> slow when compared to the alternative.
>> You really should go back to the email chain.
>> Their defense (including Salesforce who was going to push this approach)
>> fell apart when you asked the simple question on how do you handle joins?
>> That’s their OOPS moment. Once you start to understand that, then allowing
>> the index to be orthogonal to the base table, things started to come
>> together.
>> In short, you have a query either against a single table, or if you’re
>> doing a join.  You then get the indexes and assuming that you’re only using
>> the AND predicate, its a simple intersection of the index result sets.
>> (Since the result sets are ordered, its relatively trivial to walk through
>> and find the intersections of N Lists in a single pass.)
>> Now you have your result set of base table row keys and you can work with
>> that data. (Either returning the records to the client, or as input to a
>> map/reduce job.
>> That’s the 30K view.  There’s more to it, but once Salesforce got the
>> basic idea, they ran with it. It was really that simple concept that the
>> index would be orthogonal to the base table that got them moving in the
>> right direction.
>> To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
>> it seems that some of the Committers are suffering from rectal induced
>> hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
>> spotting’ but also to get the Committers to focus on bringing the solutions
>> that they glum on in the client, back to the server side of things.
>> Unfortunately the last great attempt at fixing things on the server side
>> was the bastardization of coprocessors which again, suffers from the lack
>> of thought.  This isn’t to say that allowing users to extend the server
>> side functionality is wrong. (Because it isn’t.) But that the
>> implementation done in HBase is a tad lacking in thought.
>> So in terms of indexing…
>> Longer term picture, there has to be some fixes on the server side of
>> things to allow one to associate an index (allowing for different types) to
>> a base table, yet the implementation of using the index would end up
>> becoming a client.  And by client, it would be an external query engine
>> processor that could/should sit on the cluster.
>> But hey! What do I know?
>> I gave up trying to have an intelligent/civilized conversation with Andrew
>> because he just couldn’t grasp the basics.  ;-)
>>> On Mar 13, 2015, at 4:14 PM, Andrew Purtell <apurtell@apache.org> wrote:
>>> When I made that remark I was thinking of a recent discussion we had at a
>>> joint Phoenix and HBase developer meetup. The difference of opinion was
>>> certainly civilized. (smile) I'm not aware of any specific written
>>> discussion, it may or may not exist. I'm pretty sure a revival of
>> HBASE-9203
>>> would attract some controversy, but let me be clearer this time than I
>> was
>>> before that this is just my opinion, FWIW.
>>> On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph <
>>> Joseph.Rose@childrens.harvard.edu> wrote:
>>>> I saw that it was added to their project. I’m really not keen on
>> bringing
>>>> in all the RDBMS apparatus on top of hbase, so I decided to follow other
>>>> avenues first (like trying to patch 0.98, for better or worse.)
>>>> That Phoenix article seems like a good breakdown of the various indexing
>>>> architectures.
>>>> HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
>> (as
>>>> are most of them, it seems) so I didn’t know there were these
>> differences
>>>> of opinion. Did I miss the mailing list thread where the architectural
>>>> differences were discussed?
>>>> -j
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
> -- 
> Best regards,
>   - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message