hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas
Date Tue, 22 May 2012 06:31:43 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kannan Muthukkaruppan updated HBASE-6066:
-----------------------------------------

    Description: 
I was running some single threaded scan performance tests for a table with small sized rows
that is fully cached. Some observations...

We seem to be doing several wasteful iterations over and/or building of temporary lists.

1) One such is the following code in HRegionServer.next():

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
we can avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll(). Not sure why we can directly
collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(),
which eventually also copies its results into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs
in a local variable and then in a finally clause increment the metric one shot, rather than
updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized
methods. Synchronized methods calling nested synchronized methods on the same object are probably
adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized
method. We should factor the code to avoid these nested synchronized methods.


  was:
I was runnign some single threaded scan performance tests for a table
with small sized rows that is fully cached. Some observations...

Several wasteful iterations over and/or building of temporary lists.
1) One such is the following:

{code}
   boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
   if (!values.isEmpty()) {
     for (KeyValue kv : values) {              ------> #### wasteful in most cases
       currentScanResultSize += kv.heapSize();
   }
   results.add(new Result(values));
{code}

By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
avoid the unnecessary iteration to compute currentScanResultSize.


2) An example of a wasteful temporary array, is "results" in
RegionScanner.next().

{code}
      results.clear();
      boolean returnResult = nextInternal(limit, metric);

      outResults.addAll(results);
{code}

results then gets copied over to outResults via an addAll().
Not sure why we can directly collect the results in outResults.

3) Another almost similar exmaple of a wasteful array is "results" in
StoreScanner.next(), which eventually also copies its results
into "outResults".


4) Reduce overhead of "size metric" maintained in StoreScanner.next().

{code}
  if (metric != null) {
     HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                               copyKv.getLength());
  }
  results.add(copyKv);
{code}

A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs
in a local variable and then in a finally clause increment the metric one shot, rather than
updating AtomicLongs for each KV.

5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized
methods. Synchronized methods calling nested synchronized methods on the same object are probably
adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized
method. We should factor the code to avoid these nested synchronized methods.


    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>
> I was running some single threaded scan performance tests for a table with small sized
rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we can directly
collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(),
which eventually also copies its results into "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the size of those
KVs in a local variable and then in a finally clause increment the metric one shot, rather
than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both
are synchronized methods. Synchronized methods calling nested synchronized methods on the
same object are probably adding some small overhead. The inner next() calls isFilterDone()
which is a also a synchronized method. We should factor the code to avoid these nested synchronized
methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message