hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5402) PerformanceEvaluation creates the wrong number of rows in randomWrite
Date Thu, 16 Feb 2012 17:45:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209531#comment-13209531

Jean-Daniel Cryans commented on HBASE-5402:

bq. The problem is that the resulting table is what is used in any subsequent scan tests using
PE, which are then double reading some rows, rather than reading every row once

Following this logic, how would a random read test work with keys that are UUIDs? You'll have
to be lucky to get a couple of hits :)

bq. This is counter intuitive, and also introduces the possibility of cache hits, which I
think is not what is expected by users doing a scan test.

Considering that blocks are 64KB and rows are ~1.5KB (keys+value), cache hits is going to
happen no matter what.
> PerformanceEvaluation creates the wrong number of rows in randomWrite
> ---------------------------------------------------------------------
>                 Key: HBASE-5402
>                 URL: https://issues.apache.org/jira/browse/HBASE-5402
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Oliver Meyn
> The command line 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10'
should result in a table with 10 * (1024 * 1024) rows (so 10485760).  Instead what happens
is that the randomWrite job reports writing that many rows (exactly) but running rowcounter
against the table reveals only e.g 6549899 rows.  A second attempt to build the table produced
slightly different results (e.g. 6627689).  I see a similar discrepancy when using 50 instead
of 10 clients (~35% smaller than expected).
> Further experimentation reveals that the problem is key collision - by removing the %
totalRows in getRandomRow I saw a reduction in collisions (table was ~8M rows instead of 6.6M).
 Replacing the random row key with UUIDs instead of Integers solved the problem and produced
exactly 10485760 rows.  But that makes the key size 16 bytes instead of the current 10, so
I'm not sure that's an acceptable solution.
> Here's the UUID code I used:
>   public static byte[] format(final UUID uuid) {
>     long msb = uuid.getMostSignificantBits();
>     long lsb = uuid.getLeastSignificantBits();
>     byte[] buffer = new byte[16];
>     for (int i = 0; i < 8; i++) {
>       buffer[i] = (byte) (msb >>> 8 * (7 - i));
>     }
>     for (int i = 8; i < 16; i++) {
>       buffer[i] = (byte) (lsb >>> 8 * (7 - i));
>     }
>     return buffer;
>   }
> which is invoked within getRandomRow with 
> return format(UUID.randomUUID());

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message