hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HFileInputFormat for MapReduce
Date Fri, 10 Feb 2012 17:05:36 GMT
On Fri, Feb 10, 2012 at 3:21 AM, Tim Robertson
<timrobertson100@gmail.com> wrote:
> We are using PE scan to try and "standardize" as much as possible.

Fair enough.

> Since CDH3u3 is ongoing as I type, I'm not sure on the regions (<50
> regions on 3 RS with the PE TestTable).

Why are you not sure?  Its just taking a look at master UI?

If its 50, thats a reasonable amount.

> I have not digested Ganglia yet, but I just ran the PE scan 10 with
> scanner cache sizes of 1,10,30,100,1000,10000.  Worryingly the
> performance was the same regardless of the cache size.

That is odd.

So you are comparing TFIF scan to PE scan?  The TFIF is being done in
a MR job with same amount of mappers as PE?

> $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 10

IIRC, what this does is run ten clients each scanning 1M rows each.
Each client does 1/10th of a keyspace that is 10M rows wide.  You have
> 10M rows in your table?

Maybe you need to break it down more, make it more basic.  Compare
serial scan of your table to a serial scan of its keys done via TFIF.
For the scan of the table, you could use the shell.  Looks like you
can set caching in shell:

 hbase> count 't1', INTERVAL => 10, CACHE => 1000

> 12/02/10 11:01:21 INFO mapred.JobClient:     ROWS=10485700
> 12/02/10 11:01:21 INFO mapred.JobClient:     ELAPSED_TIME=1611746
> I captured the full ganglia for an RS during this if anyone can spot
> anything obvious (I am about to try and understand this myself):
>  http://dl.dropbox.com/u/608155/cacheSize-RS.png

Your cpu is idle a bunch in these graphs (presuming job ran around
11:01.  Little to no wio.   When did the job run?  Maybe should run
longer given granularity of these graphs.

How long does the job run?  (The above elapsed time is for all mappers
aggregated IIRC).


View raw message