hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
Date Mon, 25 Nov 2013 03:33:37 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars Hofhansl updated HBASE-10015:

    Attachment: 10015-0.94-new-sample.txt

Here's yet another sample patch (for 0.94). Adds a setter for a syncObject to KeyValueScanner.
(if I wanted to change the StoreScanner constructor then we need to change the coprocessor
region observer to pass this in as well, so I opted for a setter instead). When a StoreScanner
is created by a RegionScanner it passes a reference to this as the syncObject.
So now all locking is - when needed - done through the RegionScannerImpl and I see the same
performance improvement.
As said above StoreScanner for flushes and compactions are never invalidated anyway.

We have coarsened the lock from StoreScanner.next/peek/seek/etc to RegionScannerImpl.next/peek/seek/etc.

Now, this is not really pretty. Is this worth the performance gain for tall table scans? Up
to 2x for really tall tables with small KVs, proportionally less for wider tables and larger

> Major performance improvement: Avoid synchronization in StoreScanner
> --------------------------------------------------------------------
>                 Key: HBASE-10015
>                 URL: https://issues.apache.org/jira/browse/HBASE-10015
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 10015-0.94-new-sample.txt, 10015-0.94-v2.txt, 10015-0.94-v3.txt,
10015-0.94-v4.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt, 10015-trunk-v3.txt,
10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk-v4.txt, 10015-trunk.txt, TestLoad.java
> Did some more profiling (this time with a sampling profiler) and StoreScanner.peek()
showed up a lot in the samples. At first that was surprising, but peek is synchronized, so
it seems a lot of the sync'ing cost is eaten there.
> It seems the only reason we have to synchronize all these methods is because a concurrent
flush or compaction can change the scanner stack, other than that only a single thread should
access a StoreScanner at any given time.
> So replaced updateReaders() with some code that just indicates to the scanner that the
readers should be updated and then make it the using thread's responsibility to do the work.
> The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance
improvement across all scenarios.
> Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation
and TestAcidGuarantees a few times in a loop, all still pass.
> Will attach a sample patch.

This message was sent by Atlassian JIRA

View raw message