hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: A question about HBase MapReduce
Date Fri, 25 May 2012 17:16:07 GMT

re:  "data from raw data file into hbase table"

One approach is bulk loading..


If he's talking about using an Hbase table as the source of a MR job, then
see this...


On 5/25/12 2:35 AM, "Florin P" <florinpico@yahoo.com> wrote:

>I've read Lars George's blog
>http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html where
>at the end of the article, he mentioned "In the next post I will show you
>how to import data from a raw data
>file into a HBase table and how you eventually process the data in the
>HBase table. We will address questions like how many mappers and/or
>reducers are needed and how can I improve import and processing
>performance.". I looked in the blog up for these questions, but it seems
>that there is no article related. Do you knoe if he you touched these
>subjects into a different post or book? Particular I am interested
>1. how you can set up the number of mappers?
>2. number of mappers can be set up per region server? If yes how?
>3. How the big number of set up mappers can affect the data locality?
>4. is this algorithm for computing the number of mappers
>(https://issues.apache.org/jira/browse/HBASE-1172) still available
>the number of mappers specified when using TableInputFormat is strictly
>followed if less than total regions on the input table. If greater, the
>number of regions is used.
>This will modify the splitting algorithm to do the following:
>	* Specify 0 mappers when you want # mappers = # regions
>	* If you specify fewer mappers than regions, will use exactly the number
>you specify based on the current algorithm
>	* If
>you specify more mappers than regions, will divide regions up by
>determining [start,X) [X,end). The number of mappers will always be a
>multiple of number of regions. This is so we do not have scanners
>spanning multiple regions.
>There is an additional issue in that the default number of mappers
>in JobConf is set to 1. That means if a user does not explicitly set
>number of map tasks, a single mapper will be used. "
>I'll look forward for you answers. Thank you.
>Kind regards, Florin

View raw message