hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: map reduce range of records from hbase table
Date Wed, 08 Oct 2008 21:13:57 GMT
Cedric Ho wrote:
> Hi all,
>
> I am using 0.18.0 and have successfully used data from hbase table as
> input to my map/reduce job.
>
> I wonder how to specify a subset of records from a table instead of
> taking all records as input.
> Such as a range of the row keys or maybe by specific values of certain columns.
>   
You'll have to subclass the TableInputFormat.

There is an example in the javadoc on subclassing TIF: 
http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html

(Sorry, the example is mangled.  Do a get of the html source to see 
non-garbled code).

The example shows you how to set a filter.  Filters can filter on rows 
and values.

To work against a subset, you'd probably need to play with getSplits  in 
your subclass.   Default, it  basically eretrns as many splits as there 
are regions in your table, so its the whole table always.  Filters could 
stop unwanted rows being returned but maybe its better if the rows 
weren't considered in the first place; hence the need of getSplits 
subclassing.

St.Ack


Mime
View raw message