hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: speeding up rowcount
Date Mon, 10 Oct 2011 00:44:56 GMT
Be aware that the contract for a scan is to return all rows sorted by rowkey, hence it cannot
scan regions in parallel by default.I have not played much HBase with MapReduce, but if order
is not important you can to split the scan into multiple scans.

----- Original Message -----
From: Tom Goren <tom@tomgoren.com>
To: user@hbase.apache.org
Sent: Sunday, October 9, 2011 8:07 AM
Subject: Re: speeding up rowcount

lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
million rows...

On Sun, Oct 9, 2011 at 7:50 AM, Rita <rmorgan466@gmail.com> wrote:

> Hi,
> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
> to
> count a 500million rows in a table. I was wondering if there are any map
> reduce tunings I can do so it will go much faster.
> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
> tuning
> advice would be much appreciated.
> --
> --- Get your facts first, then you can distort them as you please.--

View raw message