hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N Keywal <nkey...@gmail.com>
Subject Re: performance improvment on regionserver.MemStore/updateColumnValue
Date Wed, 20 Jul 2011 15:28:35 GMT
Aggreed. But there is a big advantage when you work on the issues found on
the unit tesst: the code you're modifying is already covered by the unit
tests... :-)

On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <joey@cloudera.com> wrote:

> I would compare YCSB between the patched and unpatched version for a more
> realistic workload than the unit tests provide.
>
> -Joey
>
> On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <nkeywal@gmail.com> wrote:
>
> > Hello,
> >
> > Some words on the context: We're thinking about using HBase for a product
> > we're developping. For this reason, I am currently looking at HBase
> source
> > code to understand  how to debug & modify HBase.  To start with something
> > simple but useful, I am looking for performance improvement by profiling
> > hbase during the execution of the unit tests. I expect that many of the
> > hotspots found on the unit tests are also hotspots in real production. I
> > plan to spend around 10 m.d on this until september.
> >
> >
> > The method regionserver.MemStore/updateColumnValue seems quite used, and
> is
> > ultimatly responsible of 30% of the time in the test subsets I am using.
> >
> >
> > There is bit of it that can be optimized easily by changing the
> conditions
> > order:
> >
> >        if (firstKv.matchingQualifier(kv)) {
> >          if (kv.getType() == KeyValue.Type.Put.getCode()) {
> >            now = Math.max(now, kv.getTimestamp());
> >          }
> >        }
> >
> >  becomes:
> >                if (kv.getType() == KeyValue.Type.Put.getCode() &&
> >                        kv.getTimestamp() > now &&
> >                        firstKv.matchingQualifier(kv)) {
> >                    now = kv.getTimestamp();
> >                }
> >
> >  As comparing the qualifier is much more expensive, we put it at the end.
> >  It improve the performances by 3% (i.e: total execution time lowered by
> > 3%).
> >
> >
> >  So first question: would you be interested by a patch for this kind of
> > stuff?
> >
> >
> >
> >  Second question (more technical...): in this method
> > (regionserver.MemStore/updateColumnValue), I see:
> >
> >            KeyValue firstKv = KeyValue.createFirstOnRow(
> >                    row, family, qualifier);
> >
> >            [...]
> >            while (it.hasNext()) {
> >                KeyValue kv = it.next();
> >
> >                // if this isnt the row we are interested in, then bail:
> >                if (!firstKv.matchingColumn(family, qualifier) ||
> > !firstKv.matchingRow(kv)) {
> >                    break; // rows dont match, bail.
> >                }
> >
> >                [...]
> >            }
> >
> >   For the test "firstKv.matchingColumn(family, qualifier)", I don't see:
> >   1) Why it is tested in the loop, as firstKv is not modified, the result
> > won't change.
> >   2) How the result can be 'false', as firstKv is inialized with the
> family
> > and the parameters.
> >
> >   Or is it shared for update a way or another?
> >
> >   If we can remove it, we gain another 2%...
> >
> >
> >   N.
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message