hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Krishna <harikrishna.gogin...@gmail.com>
Subject Pre splitting the HBase Table for specific row key design
Date Fri, 27 Dec 2013 06:02:15 GMT

We are planning to migrate form CDH3 cluster to CDH4 cluster and as part of
migration we are also planning to use HBase instead of Hive ware house that
we are using in CDH3 cluster. Daily we are bringing the data from oracle to
hadoop using sqooping and we are having 10 different data base schema from
where we are bringing.

In hive ware house we have maintained a table with schema name as higher
level partition and date as other partition in side schema partition. Every
day the  data for the table will be kept on date partition.

In HBase we have designed a table to have a row key as combination of (byte
array value of Bucket Number(value ranges from 0 to 15, so total of 16
buckets we are maintaining), MD5(of schema), MD5(date), byte array value of
pkid). It is working as expected, we are able to retrieve the data based on
schema and date wise, which is our key use case. Here each bucket having a
key of ranges 0 to long max.

Now we are having a challenge in pre-splitting the table (lets say table
name as transactions). Can any one help me on this.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message