hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
Date Thu, 21 Nov 2013 03:04:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828439#comment-13828439
] 

Vladimir Rodionov commented on HBASE-10015:
-------------------------------------------

May be I am wrong (empty synchronized method call cost on my laptop is 25 ns) but my own tests
on StoreScanner show 0 improvement. 

Code is simple:

create region, populate with data (make sure data is in a cache) , then

{code}
     LOG.info("Test store scanner");
      Scan scan = new Scan();
      scan.setStartRow(region.getStartKey());
      scan.setStopRow(region.getEndKey());
      Store store = region.getStore(CF);
      StoreScanner scanner = new StoreScanner(store,  store.getScanInfo(), scan,  null);
      long start = System.currentTimeMillis();
      int total = 0;
      List<KeyValue> result = new ArrayList<KeyValue>();
      while(scanner.next(result)){
        total++; result.clear();
      }
      
      LOG.info("Test store scanner finished. Found "+total +" in "+(System.currentTimeMillis()
- start)+"ms");
{code}

This test shows exact the same time for both: default StoreScanner and *unsynchronized* StoreScanner.
The scan is not very fast: 1-1.5M rows per sec (rows are relatively small: 1 CF + 5 CQ,  ~
120 bytes )

 

> Major performance improvement: Avoid synchronization in StoreScanner
> --------------------------------------------------------------------
>
>                 Key: HBASE-10015
>                 URL: https://issues.apache.org/jira/browse/HBASE-10015
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 10015-0.94.txt, TestLoad.java
>
>
> Did some more profiling (this time with a sampling profiler) and StoreScanner.peek()
showed up a lot in the samples. At first that was surprising, but peek is synchronized, so
it seems a lot of the sync'ing cost is eaten there.
> It seems the only reason we have to synchronize all these methods is because a concurrent
flush or compaction can change the scanner stack, other than that only a single thread should
access a StoreScanner at any given time.
> So replaced updateReaders() with some code that just indicates to the scanner that the
readers should be updated and then make it the using thread's responsibility to do the work.
> The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance
improvement across all scenarios.
> Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation
and TestAcidGuarantees a few times in a loop, all still pass.
> Will attach a sample patch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message