hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tigertail <tyc...@yahoo.com>
Subject Re: map reduce range of records from hbase table
Date Thu, 11 Dec 2008 03:06:26 GMT

Hi Cedric,

Can you share your version of getSplits to feed only a subset of records to
me? I expect your method can select the subset based on row keys as well as
some column values. Thank you.

Cedric Ho wrote:
> Thanks for the solutions, I've tried overriding getSplits and it does
> what I need.
> But for the RowFilter, I guess it would also need to scan through all
> records and do filtering. So wouldn't it be the same if I do the
> filtering myself during the map phrase?
> Cedric
> On Thu, Oct 9, 2008 at 5:13 AM, stack <stack@duboce.net> wrote:
>> Cedric Ho wrote:
>>> Hi all,
>>> I am using 0.18.0 and have successfully used data from hbase table as
>>> input to my map/reduce job.
>>> I wonder how to specify a subset of records from a table instead of
>>> taking all records as input.
>>> Such as a range of the row keys or maybe by specific values of certain
>>> columns.
>> You'll have to subclass the TableInputFormat.
>> There is an example in the javadoc on subclassing TIF:
>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
>> (Sorry, the example is mangled.  Do a get of the html source to see
>> non-garbled code).
>> The example shows you how to set a filter.  Filters can filter on rows
>> and
>> values.
>> To work against a subset, you'd probably need to play with getSplits  in
>> your subclass.   Default, it  basically eretrns as many splits as there
>> are
>> regions in your table, so its the whole table always.  Filters could stop
>> unwanted rows being returned but maybe its better if the rows weren't
>> considered in the first place; hence the need of getSplits subclassing.
>> St.Ack

View this message in context: http://www.nabble.com/map-reduce-range-of-records-from-hbase-table-tp19873787p20948685.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message