hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject ETL HBase HFile+HLog to ORC(or Parquet) file?
Date Fri, 21 Oct 2016 20:48:08 GMT

I am wondering whether there are existing methods to ETL HBase data to
ORC(or other open source columnar) file?

I understand in Hive "insert into Hive_ORC_Table from SELET * from
HIVE_HBase_Table", can probably get the job done. Is this the common way to
do so? Performance is acceptable and able to handle the delta update in the
case HBase table changed?

I did a bit google, and find this

which is another way around.

Will it perform better(comparing to above Hive stmt) if using either
replication logic or snapshot backup to generate ORC file from hbase tables
and with incremental update ability?

I hope to has as fewer dependency as possible. in the Example of ORC, will
only depend on Apache ORC's API, and not depend on Hive


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message