hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florin P <florinp...@yahoo.com>
Subject A question about HBase MapReduce
Date Fri, 25 May 2012 06:35:05 GMT

I've read Lars George's blog http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html
where at the end of the article, he mentioned "In the next post I will show you how to import
data from a raw data
file into a HBase table and how you eventually process the data in the
HBase table. We will address questions like how many mappers and/or
reducers are needed and how can I improve import and processing
performance.". I looked in the blog up for these questions, but it seems that there is no
article related. Do you knoe if he you touched these subjects into a different post or book?
Particular I am interested  

1. how you can set up the number of mappers?
2. number of mappers can be set up per region server? If yes how?
3. How the big number of set up mappers can affect the data locality?
4. is this algorithm for computing the number of mappers (https://issues.apache.org/jira/browse/HBASE-1172)
still available
the number of mappers specified when using TableInputFormat is strictly
followed if less than total regions on the input table. If greater, the
number of regions is used.
This will modify the splitting algorithm to do the following:
	* Specify 0 mappers when you want # mappers = # regions
	* If you specify fewer mappers than regions, will use exactly the number you specify based
on the current algorithm
	* If
you specify more mappers than regions, will divide regions up by
determining [start,X) [X,end). The number of mappers will always be a
multiple of number of regions. This is so we do not have scanners
spanning multiple regions.
There is an additional issue in that the default number of mappers
in JobConf is set to 1. That means if a user does not explicitly set
number of map tasks, a single mapper will be used. "

I'll look forward for you answers. Thank you.

Kind regards, Florin
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message