Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DEEB69F0B for ; Tue, 14 Feb 2012 16:14:47 +0000 (UTC) Received: (qmail 56208 invoked by uid 500); 14 Feb 2012 16:14:46 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 56159 invoked by uid 500); 14 Feb 2012 16:14:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 56151 invoked by uid 99); 14 Feb 2012 16:14:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 16:14:46 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.213.169 as permitted sender) Received: from [209.85.213.169] (HELO mail-yx0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 16:14:40 +0000 Received: by yenl5 with SMTP id l5so119565yen.14 for ; Tue, 14 Feb 2012 08:14:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=SZlrKfINxbPGl5j6snszt35nqexpE/XFX55FuhsATZQ=; b=bJjWDiZU1ChLp8A64de4aTEpAiGE52hd78ynAsSQazTLXGOTBkk0uwzDs8QR2o3XyA W5AJQrMCs3//Vo4Q4nbVDdIkxMmrT3I0wRFQyFT/GtPFq9VvIEPIUQkIHr88N/okv7O1 T3b1BQd2nUXAYvYJA7wW7CljBpLZXR/Nch2mU= MIME-Version: 1.0 Received: by 10.60.28.10 with SMTP id x10mr724675oeg.71.1329236059683; Tue, 14 Feb 2012 08:14:19 -0800 (PST) Sender: saint.ack@gmail.com Received: by 10.182.19.35 with HTTP; Tue, 14 Feb 2012 08:14:19 -0800 (PST) In-Reply-To: References: Date: Tue, 14 Feb 2012 08:14:19 -0800 X-Google-Sender-Auth: 9Q_67O8_DWmDbf8fzNuD1_0_xYc Message-ID: Subject: Re: strange PerformanceEvaluation behaviour From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Feb 14, 2012 at 7:56 AM, Oliver Meyn (GBIF) wrote: > 1) With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEva= luation randomWrite 10' I see 100 mappers spawned, rather than the expected= 10. =A0I expect 10 because that's what the usage text implies, and what th= e javadoc explicitly states - quoting from doMapReduce "Run as many maps as= asked-for clients." =A0The culprit appears to be the outer loop in writeIn= putFile which sets up 10 splits for every "asked-for client" - at least, if= I'm reading it right. =A0Is this somehow expected, or is that code leftove= r from some previous iteration/experiment? > Yeah. I'd expect ten clients, each to its own map, each doing 1M items eac= h. Looking at writeInputFile, it seems to be dividing the namespace by ten so, yeah x10 mappers. > 2) With that same randomWrite command line above, I would expect a result= ing table with 10 * (1024 * 1024) rows (so 10485700 =3D roughly 10M rows). = =A0Instead what I'm seeing is that the randomWrite job reports writing that= many rows (exactly) but running rowcounter against the table reveals only = 6549899 rows. =A0A second attempt to build the table produces slightly diff= erent results (e.g. 6627689). =A0I see a similar discrepancy when using 50 = instead of 10 clients (~35% smaller than expected). =A0Key collision could = explain it, but it seems pretty unlikely (given I only need e.g. 10M keys f= rom a potential 2B). > Yeah, I'd think key overlap (print out the span for each mappers or check that file written by writeInputFile). Your clocks are all in sync? St.Ack