hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16824) Make replacement of path the first operation during WAL rotation
Date Fri, 14 Oct 2016 21:40:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576576#comment-15576576

Enis Soztutar commented on HBASE-16824:

The main problem is this: 
 - We use the SafePointZigZagLatch to coordinate the safe point between the log roller thread
and the RingBufferEventHandler thread. 
 - LogRoller starts the safe point process by signaling to the RBEH to start attaining the
safe point. 
 - RBEH sees this request and waits until the sync point is past the sequence of the last
item in the batch. By this time, every thing should already be appended and waiting for the
 - RBEH waits for the highest synced sequence id to be greater or equal to the waiting sequence
id which makes sure that the writer.sync() completes and data is safe. This loop:
        while ((!this.shutdown && this.zigzagLatch.isCocked()
            && highestSyncedTxid.get() < currentSequence &&
            // We could be in here and all syncs are failing or failed. Check for this. Otherwise
            // we'll just be stuck here for ever. In other words, ensure there syncs running.
 - However, even though the {{highestSyncedTxid.get() >= currentSequence}} at this point,
some other SyncRunner thread may still be trying to sync entries which are less then highestSyncedTxid.
We have an optimization to return early without calling {{writer.sync()}}, but we cannot rely
on that (because of thread scheduling can happen in between the check and writer.sync() call.

 - This results in a case where we have already closed and replaced the writer, but a LogSyncer
thread calls writer.sync() on an already closed stream. All the SyncFutures then will get
Exceptions rather than the success result (it should succeed because a higher trx id is already
 - The fix is simple conceptually. We have to also wait for all the SyncRunner threads to
finish their work at the attainSafePoint. 

> Make replacement of path the first operation during WAL rotation
> ----------------------------------------------------------------
>                 Key: HBASE-16824
>                 URL: https://issues.apache.org/jira/browse/HBASE-16824
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Atri Sharma
> In https://issues.apache.org/jira/browse/HBASE-12074, we hit an error if an async thread
calls flush on a WAL record already closed as the WAL is being rotated. This JIRA investigates
if setting the new WAL record path as the first operation during WAL rotation will fix the

This message was sent by Atlassian JIRA

View raw message