hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-10015) Major performance improvement: Avoid synchronization in StoreScanner
Date Fri, 22 Nov 2013 05:18:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829649#comment-13829649
] 

Lars Hofhansl edited comment on HBASE-10015 at 11/22/13 5:16 AM:
-----------------------------------------------------------------

Unfortunately I am no longer sure it actually works. I identified the problem:
In HStore.completeCompaction we call notifyChangedReadersObservers, which calls updateReaders
on all StoreScanners. Before this patch, this method would block if there was a StoreScanner
in the middle of a next/seek/reseek/etc. So before that patch one would guarantee that after
notifyChangedReadersObservers returns it is safe to remove the compacted files. That is no
longer true.

I'll see if I can come up with something. It is a shame that we have to synchronize and get
10's of millions of branch misses per second, just so we can compact a few times a day.



was (Author: lhofhansl):
Unfortunately I am no longer sure it actually works. I identified the problem:
In HStore.completeCompaction we call notifyChangedReadersObservers, which calls updateReaders
on all StoreScanners. Before this patch, this method would block if there was a StoreScanner
with a next/seek/reseek method. So before that patch one would guarantee that after notifyChangedReadersObservers
returns it is safe to remove the compated files. That is no longer true.
I'll see if I can come up with something. It is a shame that we have to synchronize and get
10's of millions of branch mispredictions per second, just so we can compact a few times a
day.


> Major performance improvement: Avoid synchronization in StoreScanner
> --------------------------------------------------------------------
>
>                 Key: HBASE-10015
>                 URL: https://issues.apache.org/jira/browse/HBASE-10015
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.96.1, 0.94.15
>
>         Attachments: 10015-0.94-v2.txt, 10015-0.94-withtest.txt, 10015-0.94.txt, 10015-trunk-v2.txt,
10015-trunk.txt, TestLoad.java
>
>
> Did some more profiling (this time with a sampling profiler) and StoreScanner.peek()
showed up a lot in the samples. At first that was surprising, but peek is synchronized, so
it seems a lot of the sync'ing cost is eaten there.
> It seems the only reason we have to synchronize all these methods is because a concurrent
flush or compaction can change the scanner stack, other than that only a single thread should
access a StoreScanner at any given time.
> So replaced updateReaders() with some code that just indicates to the scanner that the
readers should be updated and then make it the using thread's responsibility to do the work.
> The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance
improvement across all scenarios.
> Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation
and TestAcidGuarantees a few times in a loop, all still pass.
> Will attach a sample patch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message