hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: strange PerformanceEvaluation behaviour
Date Tue, 14 Feb 2012 16:14:19 GMT
On Tue, Feb 14, 2012 at 7:56 AM, Oliver Meyn (GBIF) <omeyn@gbif.org> wrote:
> 1) With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite
10' I see 100 mappers spawned, rather than the expected 10.  I expect 10 because that's what
the usage text implies, and what the javadoc explicitly states - quoting from doMapReduce
"Run as many maps as asked-for clients."  The culprit appears to be the outer loop in writeInputFile
which sets up 10 splits for every "asked-for client" - at least, if I'm reading it right.
 Is this somehow expected, or is that code leftover from some previous iteration/experiment?

Yeah.  I'd expect ten clients, each to its own map, each doing 1M items each.

Looking at writeInputFile, it seems to be dividing the namespace by
ten so, yeah x10 mappers.

> 2) With that same randomWrite command line above, I would expect a resulting table with
10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows).  Instead what I'm seeing is that
the randomWrite job reports writing that many rows (exactly) but running rowcounter against
the table reveals only 6549899 rows.  A second attempt to build the table produces slightly
different results (e.g. 6627689).  I see a similar discrepancy when using 50 instead of 10
clients (~35% smaller than expected).  Key collision could explain it, but it seems pretty
unlikely (given I only need e.g. 10M keys from a potential 2B).

Yeah, I'd think key overlap (print out the span for each mappers or
check that file written by writeInputFile).

Your clocks are all in sync?


View raw message