Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B0C67177B2 for ; Wed, 7 Jan 2015 13:16:34 +0000 (UTC) Received: (qmail 26136 invoked by uid 500); 7 Jan 2015 13:16:35 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 26082 invoked by uid 500); 7 Jan 2015 13:16:35 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 26070 invoked by uid 99); 7 Jan 2015 13:16:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2015 13:16:35 +0000 Date: Wed, 7 Jan 2015 13:16:35 +0000 (UTC) From: "Oliver Meyn (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-5402) PerformanceEvaluation creates the wrong number of rows in randomWrite MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267609#comment-14267609 ] Oliver Meyn commented on HBASE-5402: ------------------------------------ As nobody else has commented on this issue I don't think it's worth debating further - I'm fine with Close as Invalid. > PerformanceEvaluation creates the wrong number of rows in randomWrite > --------------------------------------------------------------------- > > Key: HBASE-5402 > URL: https://issues.apache.org/jira/browse/HBASE-5402 > Project: HBase > Issue Type: Improvement > Components: test > Reporter: Oliver Meyn > Labels: beginner > > The command line 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' should result in a table with 10 * (1024 * 1024) rows (so 10485760). Instead what happens is that the randomWrite job reports writing that many rows (exactly) but running rowcounter against the table reveals only e.g 6549899 rows. A second attempt to build the table produced slightly different results (e.g. 6627689). I see a similar discrepancy when using 50 instead of 10 clients (~35% smaller than expected). > Further experimentation reveals that the problem is key collision - by removing the % totalRows in getRandomRow I saw a reduction in collisions (table was ~8M rows instead of 6.6M). Replacing the random row key with UUIDs instead of Integers solved the problem and produced exactly 10485760 rows. But that makes the key size 16 bytes instead of the current 10, so I'm not sure that's an acceptable solution. > Here's the UUID code I used: > public static byte[] format(final UUID uuid) { > long msb = uuid.getMostSignificantBits(); > long lsb = uuid.getLeastSignificantBits(); > byte[] buffer = new byte[16]; > for (int i = 0; i < 8; i++) { > buffer[i] = (byte) (msb >>> 8 * (7 - i)); > } > for (int i = 8; i < 16; i++) { > buffer[i] = (byte) (lsb >>> 8 * (7 - i)); > } > return buffer; > } > which is invoked within getRandomRow with > return format(UUID.randomUUID()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)