hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oliver Meyn (GBIF)" <om...@gbif.org>
Subject strange PerformanceEvaluation behaviour
Date Tue, 14 Feb 2012 15:56:38 GMT
Hi all,

I've been trying to run a battery of tests to really understand our cluster's performance,
and I'm employing PerformanceEvaluation to do that (picking up where Tim Robertson left off,
elsewhere on the list).  I'm seeing two strange things that I hope someone can help with:

1) With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite
10' I see 100 mappers spawned, rather than the expected 10.  I expect 10 because that's what
the usage text implies, and what the javadoc explicitly states - quoting from doMapReduce
"Run as many maps as asked-for clients."  The culprit appears to be the outer loop in writeInputFile
which sets up 10 splits for every "asked-for client" - at least, if I'm reading it right.
 Is this somehow expected, or is that code leftover from some previous iteration/experiment?

2) With that same randomWrite command line above, I would expect a resulting table with 10
* (1024 * 1024) rows (so 10485700 = roughly 10M rows).  Instead what I'm seeing is that the
randomWrite job reports writing that many rows (exactly) but running rowcounter against the
table reveals only 6549899 rows.  A second attempt to build the table produces slightly different
results (e.g. 6627689).  I see a similar discrepancy when using 50 instead of 10 clients (~35%
smaller than expected).  Key collision could explain it, but it seems pretty unlikely (given
I only need e.g. 10M keys from a potential 2B).

Any and all input appreciated.


Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12

View raw message