hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8208) Data could not be replicated to slaves when deferredLogSync is enabled
Date Fri, 29 Mar 2013 18:21:16 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeffrey Zhong updated HBASE-8208:
---------------------------------

    Attachment: hbase-8208.patch

{quote}
So, should we just call sync() in FSLog.startCacheFlush() regardless of the replication state?
It seems harmless.
{quote}
That's a good idea. I put sync() inside function internalFlushcache instead of FSHLog.startCacheFlush()
because the function is wrapped under updatesLock.writelock while the wal.sync seems not need
the lock. I put sync() before {code}mvcc.waitForRead(w);{code} to hopefully take some advantage
of the wait. 

I also moved the check {code}txid <= this.syncedTillHere{code} to the beginning the function
syncer(long txid) so it may skip some acquiring of this.updateLock.

Thanks,
-Jeffrey
                
> Data could not be replicated to slaves when deferredLogSync is enabled
> ----------------------------------------------------------------------
>
>                 Key: HBASE-8208
>                 URL: https://issues.apache.org/jira/browse/HBASE-8208
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.95.0, 0.98.0, 0.94.6
>            Reporter: Jeffrey Zhong
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>
>         Attachments: hbase-8208.patch
>
>
> This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush
data before syncing all HLog entries. Assuming we just flush the internal cache and the server
dies with some unsynced hlog entries. 
> Data is not lost at the source cluster while replication is based on WAL files and some
changes we flushed at the source won't be replicated the slave clusters. 
> Although enabling deferredLogSync with tolerances of data loss, it breaks the replication
assumption that whatever persisted in the source should be replicated to its slave clusters.

> In short, the slave cluster could end up with double losses: the data loss in the source
and some data stored in source cluster may not be replicated to slaves either.
> The fix of the issue isn't hard. Basically we can invoke sync during each flush when
replication is enabled for a region server. Since sync returns immediately when nothing to
sync so there should be no performance impact.
> Please let me know what you think!
> Thanks,
> -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message