hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Splitting up an HBase Table into partitions
Date Tue, 17 Mar 2015 13:19:56 GMT
Hbase doesn’t have partitions.  It has regions.

The split occurs against the regions so that if you have n regions, you have n splits. 

Please don’t confuse partitions and regions because they are not the same or synonymous.


> On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan <royalgok@gmail.com> wrote:
> 
> Hi,
> 
> My requirement is to partition an HBase Table and return a group of records
> (i.e. rows having a specific format) without having to iterate over all of
> its rows. These partitions (which should ideally be along regions) will
> eventually be sent to Spark but rather than use the HBase or Hadoop RDDs
> directly, I'll be using a custom RDD which recognizes partitions as the
> aforementioned group of records.
> 
> I was looking at achieving this through creating InputSplits through
> TableInputFormat.getSplits(), as being done in the HBase RDD [1] but I
> can't figure out a way to do this without having access to the mapred
> context etc.
> 
> Would greatly appreciate if someone could point me in the right direction.
> 
> [1]
> https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala
> 
> Thanks,
> Gokul

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message