hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
Date Sun, 22 Feb 2015 05:11:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332025#comment-14332025
] 

Lars Hofhansl edited comment on HBASE-13082 at 2/22/15 5:10 AM:
----------------------------------------------------------------

Quick note why this works:
# StoreScanner is passed an explicit object to sync on in updateReaders (it does not care
what this object is, just that it needs to sync on it).
# We pass the RegionScannerImpl object down as the "sync" object
# All operations that call any StoreScanner method are synchronized already on RegionScannerImpl
(except for nextRaw, but that requires the caller to do the locking himself)
# Now any region scanner operation will prevent the readers from being updated

#4 is much coarser than locking at the StoreScanner object - StoreScanner.peek is by far the
worst, as it is called all over the place. There is no way in StoreScanner (that I see) that
avoids locking every single operation (causing a memory fence, read and write barrier in this
case). As said above, the lock is almost never contended, the problem are the memory fences,
which *kill* multi core performance.

It leads to the caveat listed above. Very heavy read load can essentially prevent flushes
or compaction from finishing.
But note that this is *already* the case, it is just currently more likely that the flush/compaction
will get through, because the locks are more fine grained. Checkout StoreScanner.next(List<Cell>),
it already holds a lock for the entire duration of the row fetch. This patch coarsens that
to the Scan's batch and up the region. So reads on other stores can lock out flushes/compactions
of a store.
Also note that compactions usually run a long time, and only need the lock once to switch
the readers around, same for flushes. Need to do testing but I doubt it's an issue.

Fair locking can help here, but comes with other issues.

I've done local node testing. (local single node HDFS cluster, running single node HBase on
top)

Let me know if the patch is clear. If not, what do I need to change? Worth doing?



was (Author: lhofhansl):
Quick note why this works:
# StoreScanner is passed an explicit object to sync on in updateReaders (it does not care
what this object is, just that it needs to sync on it).
# We the RegionScannerImpl object down as the "sync" object
# All operations that call any StoreScanner method are synchronized already http://github.com/Xfennec/cvon
RegionScannerImpl (except for nextRaw, but that requires the caller to do the locking himself)
# Now any region scanner operation will prevent the readers from being updated

#4 is much coarser than locking at the StoreScanner object - StoreScanner.peek is by far the
worst, as it is called all over the place. There is no way in StoreScanner (that I see) that
avoids locking every single operation (causing a memory fence, read and write barrier in this
case). As said above, the lock is almost never contended, the problem are the memory fences,
which *kill* multi core performance.

It leads to the caveat listed above. Very heavy read load can essentially prevent flushes
or compaction from finishing.
But note that this is *already* the case, it is just currently more likely that the flush/compaction
will get through, because the locks are more fine grained. Checkout StoreScanner.next(List<Cell>),
it already holds a lock for the entire duration of the row fetch. This patch coarsens that
to the Scan's batch and up the region. So reads on other stores can lock out flushes/compactions
of a store.
Also note that compactions usually run a long time, and only need the lock once to switch
the readers around, same for flushes. Need to do testing but I doubt it's an issue.

Fair locking can help here, but comes with other issues.

I've done local node testing. (local single node HDFS cluster, running single node HBase on
top)

Let me know if the patch is clear. If not, what do I need to change? Worth doing?


> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
>                 Key: HBASE-13082
>                 URL: https://issues.apache.org/jira/browse/HBASE-13082
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to the lock
already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make the cores
wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking contract.
For example Phoenix does not lock RegionScanner.nextRaw() and required in the documentation
(not picking on Phoenix, this one is my fault as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. RegionScanner operations
would keep getting the locks and the flushes/compactions would not be able finalize the set
of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message