hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wenxing zheng <wenxing.zh...@gmail.com>
Subject Re: Performance issue in the Join query on the HBase tables
Date Fri, 29 Sep 2017 12:23:30 GMT
Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official
site of phoenix, I didn't find the report on the Join query. Not sure
whether it's much better or not

On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you looked at Phoenix ?
>
> https://phoenix.apache.org/joins.html
>
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng <wenxing.zheng@gmail.com>
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an external
> > table on Hive correspondingly with the storage by
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides random
> > access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or Hive, the
> > job ran very slowly and will saturate the network bandwidth. But it works
> > very well for the Hive SQL directly against Hive from HDFS files(make a
> > copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and the way
> > to optimize the job.
> > Regards, Wenxing
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message