hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: ETL HBase HFile+HLog to ORC(or Parquet) file?
Date Fri, 21 Oct 2016 20:57:10 GMT
Create an external table in Hive on Hbase atble. Pretty straight forward.

hive>  create external table marketDataHbase (key STRING, ticker STRING,
timecreated STRING, price STRING)

    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
SERDEPROPERTIES ("hbase.columns.mapping" =
":key,price_info:ticker,price_info:timecreated, price_info:price")

    TBLPROPERTIES ("hbase.table.name" = "marketDataHbase");



then create a normal table in hive as ORC


CREATE TABLE IF NOT EXISTS marketData (
     KEY string
   , TICKER string
   , TIMECREATED string
   , PRICE float
)
PARTITIONED BY (DateStamp  string)
STORED AS ORC
TBLPROPERTIES (
"orc.create.index"="true",
"orc.bloom.filter.columns"="KEY",
"orc.bloom.filter.fpp"="0.05",
"orc.compress"="SNAPPY",
"orc.stripe.size"="16777216",
"orc.row.index.stride"="10000" )
;
--show create table marketData;
--Populate target table
INSERT OVERWRITE TABLE marketData PARTITION (DateStamp = "${TODAY}")
SELECT
      KEY
    , TICKER
    , TIMECREATED
    , PRICE
FROM MarketDataHbase


Run this job as a cron every often


HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:48, Demai Ni <nidmgg@gmail.com> wrote:

> hi,
>
> I am wondering whether there are existing methods to ETL HBase data to
> ORC(or other open source columnar) file?
>
> I understand in Hive "insert into Hive_ORC_Table from SELET * from
> HIVE_HBase_Table", can probably get the job done. Is this the common way to
> do so? Performance is acceptable and able to handle the delta update in the
> case HBase table changed?
>
> I did a bit google, and find this
> https://community.hortonworks.com/questions/2632/loading-
> hbase-from-hive-orc-tables.html
>
> which is another way around.
>
> Will it perform better(comparing to above Hive stmt) if using either
> replication logic or snapshot backup to generate ORC file from hbase tables
> and with incremental update ability?
>
> I hope to has as fewer dependency as possible. in the Example of ORC, will
> only depend on Apache ORC's API, and not depend on Hive
>
> Demai
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message