hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: PerformanceEvaluation scan - how to read the results?
Date Wed, 25 Jan 2012 19:44:28 GMT
On Wed, Jan 25, 2012 at 6:21 AM, Tim Robertson
<timrobertson100@gmail.com> wrote:
> Hi all,
>

Hey Tim.

> This gave me 32 regions across 2 of our 3 region servers (we have HDFS
> across 17 nodes but only machines running 3 RS).
>

The balancer ran?  I'd think it'd balance the regions across the three
servers.  Something stuck in transition stopping the balancer running
(See master log).


> And then the following to scan:
> $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5
>

So, sounds like we're going against two of the three servers only.

> The output of the scan is:
> 12/01/25 15:11:02 INFO mapred.JobClient:     ROWS=5242850
> 12/01/25 15:11:02 INFO mapred.JobClient:     ELAPSED_TIME=1624832
> (job took 52 secs in reality)
>
> Can anyone elaborate on how I am meant to interpret these numbers
> please?  Looks like 3.2 rows per <timeunit>
>

Your MR job scanned 5M rows.  It looks like you had 5 clients so you
should have had 5 mappers running.  The ELAPSED_TIME is supposed to be
the sum of the elapsed time of all mappers.  The above looks way wrong
to me.

> [I am trying to benchmark because our real data of 340M rows (215G on
> HDFS) takes 60 mins to scan which seems a lot]
>

Three servers?  Scanning in sequence?  What rate you seeing per server
Tim?  What kind of servers (I think you've posted your profile the
list before but ... it was a while back (smile)).  What size the rows
being returned?

Thanks,
St.Ack

Mime
View raw message