hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a <kawa.a...@gmail.com>
Subject Custom HBase table split that sents collocated rows to the same region
Date Wed, 04 Apr 2012 14:52:55 GMT
Hello,

Suppose that I have "tall-narrow" HBase table with composite key e.g. 
{class_id}#{student_id}.

The exemplary data will look like as follow:

ROW_KEY  |   ONE COLLUMN FAMILY
----------------------------------------------------------------
1        |   name = "Object Oriented Programming"
         |   location = "Building A"
         |   semester = "Winter"
         |   // many other information about class
----------------------------------------------------------------
1_1      |   name = "Alice White"
1_2      |   name = "Betty Lipcon"
// many other records related to class with ID = 1
----------------------------------------------------------------
// many other records related to class with ID = 2, 3, 4, .. N


I would like to use this HBase table as input source for my MapReduce job, where 
the mapper will emit <key, value> pairs where:
key = ${class_id}#${student_id},
value = some information about corresponding class.

Thanks to lexicographically sorting of row keys, it would be easily to implement 
if I could split HBase table into regions where all colocated rows (with the 
same row prefix i.e. {class_id}) will reside in the same region. Then for each 
group of such collocated records, I could use its first row to get information 
about class and emit this information with rowkey from each remaining row.

So I would like to ask, if such a custom split is easy to implement?

I know that:
1) I could model it with "flat-wide" table and I will have everything what I 
need in separate rows,
2) use two MR jobs for that.

but I am interested in best solution for "tall-narrow" table with one MR job.

Many thanks in advance for any hints!




          








Mime
View raw message