hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: how to do parallel scanning in map reduce using hbase as input?
Date Tue, 22 Jul 2014 04:11:13 GMT
How many regions now?

You still have 20 concurrent mappers running?  Are your machines loaded w/
4 map tasks on each?  Can you up the number of concurrent mappers?  Can you
get an idea of your scan rates?  Are all map tasks scanning at same rate?
 Does one task lag the others?  Do you emit stats on each map task such as
rows processed? Can you figure your bottleneck? Are you seeking disk all
the time?  Anything else running while this big scan is going on?  How big
are your cells?  Do you have one or more column families?  How many columns?

For average region size, do du on the hdfs region directories and then sum
and divide by region count.


On Mon, Jul 21, 2014 at 7:30 PM, Li Li <fancyerii@gmail.com> wrote:

> anyone could help? now I have about 1.1 billion nodes and it takes 2
> hours to finish a map reduce job.
> ---------- Forwarded message ----------
> From: Li Li <fancyerii@gmail.com>
> Date: Thu, Jun 26, 2014 at 3:34 PM
> Subject: how to do parallel scanning in map reduce using hbase as input?
> To: user@hbase.apache.org
> my table has about 700 million rows and about 80 regions. each task
> tracker is configured with 4 mappers and 4 reducers at the same time.
> The hadoop/hbase cluster has 5 nodes so at the same time, it has 20
> mappers running. it takes more than an hour to finish mapper stage.
> The hbase cluster's load is very low, about 2,000 request per second.
> I think one mapper for a region is too small. How can I run more than
> one mapper for a region so that it can take full advantage of
> computing resources?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message