phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: [DISCUSS] Suggestions for Phoenix from HBaseCon Asia notes
Date Tue, 11 Sep 2018 20:53:38 GMT
 Sorry for coming a bit late to this. I've been thinking about some of lines for a bit.
It seems Phoenix serves 4 distinct purposes:
1. Query parsing and compiling.2. A type system3. Query execution4. Efficient HBase interface
Each of these is useful by itself, but we do not expose these as stable interfaces.We have
seen a lot of need to tie HBase into "higher level" service, such as Spark (and Presto, etc).
I think we can get a long way if we separate at least #1 (SQL) from the rest #2, #3, and #4
(Typed HBase Interface - THI).
Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, Spark, etc, can interface
efficiently with HBase via THI (#2, #3, and #4).
-- Lars
    On Monday, August 27, 2018, 11:03:33 AM PDT, Josh Elser <> wrote:
 (bcc: dev@hbase, in case folks there have been waiting for me to send 
this email to dev@phoenix)


In case you missed it, there was an HBaseCon event held in Asia 
recently. Stack took some great notes and shared them with the HBase 
community. A few of them touched on Phoenix, directly or in a related 
manner. I think they are good "criticisms" that are beneficial for us to 

1. The phoenix-$version-client.jar size is prohibitively large

In this day and age, I'm surprised that this is a big issue for people. 
I know have a lot of cruft, most of which coming from hadoop. We have 
gotten better here over recent releases, but I would guess that there is 
more we can do.

2. Can Phoenix be the de-facto schema for SQL on HBase?

We've long asserted "if you have to ask how Phoenix serializes data, you 
shouldn't be do it" (a nod that you have to write lots of code). What if 
we turn that on its head? Could we extract our PDataType serialization, 
composite row-key, column encoding, etc into a minimal API that folks 
with their own itches can use?

With the growing integrations into Phoenix, we could embrace them by 
providing an API to make what they're doing easier. In the same vein, we 
cement ourselves as a cornerstone of doing it "correctly".

3. Better recommendations to users to not attempt certain queries.

We definitively know that there are certain types of queries that 
Phoenix cannot support well (compared to optimal Phoenix use-cases). 
Users very commonly fall into such pitfalls on their own and this leaves 
a bad taste in their mouth (thinking that the product "stinks").

Can we do a better job of telling the user when and why it happened? 
What would such a user-interaction model look like? Can we supplement 
the "why" with instructions of what to do differently (even if in the 

4. Phoenix-Calcite

This was mentioned as a "nice to have". From what I understand, there 
was nothing explicitly from with the implementation or approach, just 
that it was a massive undertaking to continue with little immediate 
gain. Would this be a boon for us to try to continue in some form? Are 
there steps we can take that would help push us along the right path?

Anyways, I'd love to hear everyone's thoughts. While the concerns were 
raised at HBaseCon Asia, the suggestions that accompany them here are 
largely mine ;). Feel free to break them out into their own threads if 
you think that would be better (or say that you disagree with me -- 
that's cool too)!

- Josh
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message