hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakhi Khatwani <rakhi.khatw...@gmail.com>
Subject Re: Custom Input Split
Date Wed, 22 Apr 2009 12:48:25 GMT
Hi Lars,
           Thanks for the suggesstion, I also figured out my problem using

but my table had only one region but i still wanted to split the input into
4 maps.
so i am basically overriding the getInputSplits() method in

One more question
is there any method in hbase API which can count the number of rows in a
i tried googling it and all i came across is a RowCounter class which is a
mapreduce job to count the number of rows. but i really dont know how to use
it. any suggestions?


On Wed, Apr 22, 2009 at 4:30 AM, Lars George <lars@worldlingo.com> wrote:

> Hi Rakhi,
> This is all done in the TableInputFormatBase class, which you can extend
> and then override the getSplits() function:
> http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
> This is where you can then specify how many rows per map are assigned.
> Really straight forward as I see it. I have used it to implement a special
> "only use N regions" support where I can run a sample subset against a MR
> job. For example only map 5 out if 8K regions of a table.
> The default one will always split all regions into N maps. Hence the
> recommendation to set the number of maps to the number of regions in a
> table. If you set it to something lower than it will split the regions into
> a smaller number but with more rows per map, i.e. each map gets more than
> one region to process.
> Look into the source of the above class and it should be obvious - I hope.
> Lars
> Rakhi Khatwani wrote:
>> Hi,
>>     I have a table with N records,
>>     now i want to run a map reduce job with 4 maps and 0 reduces.
>>     is there a way i can create my own custom input split so that i can
>> send 'n' records to each map??
>>    if there is a way, can i have a sample code snippet to gain better
>> understanding?
>> Thanks
>> Raakhi.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message