Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F1C87400 for ; Wed, 20 Jul 2011 15:02:48 +0000 (UTC) Received: (qmail 38605 invoked by uid 500); 20 Jul 2011 15:02:47 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 38500 invoked by uid 500); 20 Jul 2011 15:02:47 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 38488 invoked by uid 99); 20 Jul 2011 15:02:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 15:02:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of joey@cloudera.com designates 209.85.161.53 as permitted sender) Received: from [209.85.161.53] (HELO mail-fx0-f53.google.com) (209.85.161.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 15:02:39 +0000 Received: by fxd23 with SMTP id 23so1196367fxd.12 for ; Wed, 20 Jul 2011 08:02:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.14.11 with SMTP id e11mr405772faa.131.1311174137454; Wed, 20 Jul 2011 08:02:17 -0700 (PDT) Received: by 10.223.69.141 with HTTP; Wed, 20 Jul 2011 08:02:17 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 Jul 2011 11:02:17 -0400 Message-ID: Subject: Re: performance improvment on regionserver.MemStore/updateColumnValue From: Joey Echeverria To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=001517473664b3a1c904a8818460 --001517473664b3a1c904a8818460 Content-Type: text/plain; charset=ISO-8859-1 I would compare YCSB between the patched and unpatched version for a more realistic workload than the unit tests provide. -Joey On Wed, Jul 20, 2011 at 10:49 AM, N Keywal wrote: > Hello, > > Some words on the context: We're thinking about using HBase for a product > we're developping. For this reason, I am currently looking at HBase source > code to understand how to debug & modify HBase. To start with something > simple but useful, I am looking for performance improvement by profiling > hbase during the execution of the unit tests. I expect that many of the > hotspots found on the unit tests are also hotspots in real production. I > plan to spend around 10 m.d on this until september. > > > The method regionserver.MemStore/updateColumnValue seems quite used, and is > ultimatly responsible of 30% of the time in the test subsets I am using. > > > There is bit of it that can be optimized easily by changing the conditions > order: > > if (firstKv.matchingQualifier(kv)) { > if (kv.getType() == KeyValue.Type.Put.getCode()) { > now = Math.max(now, kv.getTimestamp()); > } > } > > becomes: > if (kv.getType() == KeyValue.Type.Put.getCode() && > kv.getTimestamp() > now && > firstKv.matchingQualifier(kv)) { > now = kv.getTimestamp(); > } > > As comparing the qualifier is much more expensive, we put it at the end. > It improve the performances by 3% (i.e: total execution time lowered by > 3%). > > > So first question: would you be interested by a patch for this kind of > stuff? > > > > Second question (more technical...): in this method > (regionserver.MemStore/updateColumnValue), I see: > > KeyValue firstKv = KeyValue.createFirstOnRow( > row, family, qualifier); > > [...] > while (it.hasNext()) { > KeyValue kv = it.next(); > > // if this isnt the row we are interested in, then bail: > if (!firstKv.matchingColumn(family, qualifier) || > !firstKv.matchingRow(kv)) { > break; // rows dont match, bail. > } > > [...] > } > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > 1) Why it is tested in the loop, as firstKv is not modified, the result > won't change. > 2) How the result can be 'false', as firstKv is inialized with the family > and the parameters. > > Or is it shared for update a way or another? > > If we can remove it, we gain another 2%... > > > N. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 --001517473664b3a1c904a8818460--