Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B8697896 for ; Wed, 20 Jul 2011 15:32:49 +0000 (UTC) Received: (qmail 4827 invoked by uid 500); 20 Jul 2011 15:32:48 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 4781 invoked by uid 500); 20 Jul 2011 15:32:48 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 4773 invoked by uid 99); 20 Jul 2011 15:32:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 15:32:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joey@cloudera.com designates 209.85.161.53 as permitted sender) Received: from [209.85.161.53] (HELO mail-fx0-f53.google.com) (209.85.161.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 15:32:40 +0000 Received: by fxd23 with SMTP id 23so1218821fxd.12 for ; Wed, 20 Jul 2011 08:32:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.161.80 with SMTP id q16mr1094009fax.36.1311175940431; Wed, 20 Jul 2011 08:32:20 -0700 (PDT) Received: by 10.223.69.141 with HTTP; Wed, 20 Jul 2011 08:32:20 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 Jul 2011 11:32:20 -0400 Message-ID: Subject: Re: performance improvment on regionserver.MemStore/updateColumnValue From: Joey Echeverria To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=002354186af42ae1d204a881f04d X-Virus-Checked: Checked by ClamAV on apache.org --002354186af42ae1d204a881f04d Content-Type: text/plain; charset=ISO-8859-1 My suggestion was just to verify the performance gains. Do the profiling on unit tests and then do scale up tests with YCSB. -Joey On Wed, Jul 20, 2011 at 11:28 AM, N Keywal wrote: > Aggreed. But there is a big advantage when you work on the issues found on > the unit tesst: the code you're modifying is already covered by the unit > tests... :-) > > On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria > wrote: > > > I would compare YCSB between the patched and unpatched version for a more > > realistic workload than the unit tests provide. > > > > -Joey > > > > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal wrote: > > > > > Hello, > > > > > > Some words on the context: We're thinking about using HBase for a > product > > > we're developping. For this reason, I am currently looking at HBase > > source > > > code to understand how to debug & modify HBase. To start with > something > > > simple but useful, I am looking for performance improvement by > profiling > > > hbase during the execution of the unit tests. I expect that many of the > > > hotspots found on the unit tests are also hotspots in real production. > I > > > plan to spend around 10 m.d on this until september. > > > > > > > > > The method regionserver.MemStore/updateColumnValue seems quite used, > and > > is > > > ultimatly responsible of 30% of the time in the test subsets I am > using. > > > > > > > > > There is bit of it that can be optimized easily by changing the > > conditions > > > order: > > > > > > if (firstKv.matchingQualifier(kv)) { > > > if (kv.getType() == KeyValue.Type.Put.getCode()) { > > > now = Math.max(now, kv.getTimestamp()); > > > } > > > } > > > > > > becomes: > > > if (kv.getType() == KeyValue.Type.Put.getCode() && > > > kv.getTimestamp() > now && > > > firstKv.matchingQualifier(kv)) { > > > now = kv.getTimestamp(); > > > } > > > > > > As comparing the qualifier is much more expensive, we put it at the > end. > > > It improve the performances by 3% (i.e: total execution time lowered > by > > > 3%). > > > > > > > > > So first question: would you be interested by a patch for this kind of > > > stuff? > > > > > > > > > > > > Second question (more technical...): in this method > > > (regionserver.MemStore/updateColumnValue), I see: > > > > > > KeyValue firstKv = KeyValue.createFirstOnRow( > > > row, family, qualifier); > > > > > > [...] > > > while (it.hasNext()) { > > > KeyValue kv = it.next(); > > > > > > // if this isnt the row we are interested in, then bail: > > > if (!firstKv.matchingColumn(family, qualifier) || > > > !firstKv.matchingRow(kv)) { > > > break; // rows dont match, bail. > > > } > > > > > > [...] > > > } > > > > > > For the test "firstKv.matchingColumn(family, qualifier)", I don't > see: > > > 1) Why it is tested in the loop, as firstKv is not modified, the > result > > > won't change. > > > 2) How the result can be 'false', as firstKv is inialized with the > > family > > > and the parameters. > > > > > > Or is it shared for update a way or another? > > > > > > If we can remove it, we gain another 2%... > > > > > > > > > N. > > > > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 --002354186af42ae1d204a881f04d--