hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N Keywal <nkey...@gmail.com>
Subject performance improvment on regionserver.MemStore/updateColumnValue
Date Wed, 20 Jul 2011 14:49:13 GMT
Hello,

Some words on the context: We're thinking about using HBase for a product
we're developping. For this reason, I am currently looking at HBase source
code to understand  how to debug & modify HBase.  To start with something
simple but useful, I am looking for performance improvement by profiling
hbase during the execution of the unit tests. I expect that many of the
hotspots found on the unit tests are also hotspots in real production. I
plan to spend around 10 m.d on this until september.


The method regionserver.MemStore/updateColumnValue seems quite used, and is
ultimatly responsible of 30% of the time in the test subsets I am using.


There is bit of it that can be optimized easily by changing the conditions
order:

        if (firstKv.matchingQualifier(kv)) {
          if (kv.getType() == KeyValue.Type.Put.getCode()) {
            now = Math.max(now, kv.getTimestamp());
          }
        }

 becomes:
                if (kv.getType() == KeyValue.Type.Put.getCode() &&
                        kv.getTimestamp() > now &&
                        firstKv.matchingQualifier(kv)) {
                    now = kv.getTimestamp();
                }

 As comparing the qualifier is much more expensive, we put it at the end.
 It improve the performances by 3% (i.e: total execution time lowered by
3%).


 So first question: would you be interested by a patch for this kind of
stuff?



 Second question (more technical...): in this method
(regionserver.MemStore/updateColumnValue), I see:

            KeyValue firstKv = KeyValue.createFirstOnRow(
                    row, family, qualifier);

            [...]
            while (it.hasNext()) {
                KeyValue kv = it.next();

                // if this isnt the row we are interested in, then bail:
                if (!firstKv.matchingColumn(family, qualifier) ||
!firstKv.matchingRow(kv)) {
                    break; // rows dont match, bail.
                }

                [...]
            }

   For the test "firstKv.matchingColumn(family, qualifier)", I don't see:
   1) Why it is tested in the loop, as firstKv is not modified, the result
won't change.
   2) How the result can be 'false', as firstKv is inialized with the family
and the parameters.

   Or is it shared for update a way or another?

   If we can remove it, we gain another 2%...


   N.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message