hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Owhadi <eric.owh...@esgyn.com>
Subject RE: Performance issue in the Join query on the HBase tables
Date Fri, 29 Sep 2017 13:24:19 GMT
Hi Wenxing,
From the use case you describe, you may want to take a look at Trafodion or EsgynDB (commercial
version of Trafodion).
Trafodion uses a very mature SQL engine on top of HBASE/HIVE coming with 20 years of IP given
away to open source by Hewlett-packard 2 years ago.
Support many different JOIN types (hash join, nested joins, merge joins) with optimized overflow
to disk mechanisms over an optimized pipelined architecture, full indexing capabilities, and
an optimized row format that will make your hbase table a lot faster than it is when using
one cell per column.
From a SQL capability standpoint for analytics queries, Trafodion can run full TPCDS 99 queries.
Hope this helps,

-----Original Message-----
From: wenxing zheng [mailto:wenxing.zheng@gmail.com] 
Sent: Friday, September 29, 2017 7:24 AM
To: dev@hbase.apache.org
Subject: Re: Performance issue in the Join query on the HBase tables

Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official site of phoenix,
I didn't find the report on the Join query. Not sure whether it's much better or not

On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you looked at Phoenix ?
> https://phoenix.apache.org/joins.html
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng 
> <wenxing.zheng@gmail.com>
> wrote:
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an 
> > external table on Hive correspondingly with the storage by 
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides 
> > random access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or 
> > Hive, the job ran very slowly and will saturate the network 
> > bandwidth. But it works very well for the Hive SQL directly against 
> > Hive from HDFS files(make a copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and 
> > the way to optimize the job.
> > Regards, Wenxing
> >
View raw message