hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10557) DroppedSnapshotException is not handled properly for flush triggered by hlog-replay and non-abort region close
Date Mon, 17 Feb 2014 09:17:19 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feng Honghua updated HBASE-10557:
---------------------------------

    Description: Flush triggered by hlog-replay(replayRecoveredEdits) and region-close(non-abort
close) get processed directly by region without putting flush entry into flushQueue, hence
not handled by MemStoreFlusher, So DroppedSnapshotException emitted from internalFlushcache
is not handled properly  (was: During the code review when investigating HBASE-10499, a possibility
of data loss due to non-handled DroppedSnapshotException for user-triggered flush is exposed.

Data loss can happen as below:
# A flush for some region is triggered via HBaseAdmin or shell
# The request reaches regionserver and eventually HRegion.internalFlushcache is called, then
fails at persisting memstore's snapshot to hfile, DroppedSnapshotException is thrown and the
snapshot is left not cleared.
# DroppedSnapshotException is not handled in HRegion, and is just encapsulated as a ServiceException
before returning to client
# After a while, some new writes are handled and put in the current memstore, then a new flush
is triggered for the region due to memstoreSize exceeds flush threshold
# This second(new) flush succeeds, for the HStore which failed in the previous user-triggered
flush, the remained non-empty snapshot is used rather than a new snapshot made from the current
memstore, but HLog's latest sequenceId is used for the resultant hfiles --- the sequenceId
attached within the hfiles says all edits with sequenceId <= it have all been persisted,
but actually it's not the truth for the edits still in the existing memstore
# Now the regionserver hosting this region dies
# During the replay phase of failover, the edits corresponding to the ones while are in memstore
and not actually persisted in hfiles when the previous regionserver dies will be ignored,
since they are deemed as persisted by compared to the hfiles' latest consequenceID --- These
edits are lost...

For the second flush, we also can't discard the remained snapshot and make a new one using
current memstore, that way the data in the remained snapshot is lost. We should abort the
regionserver immediately and rely on the failover to replay the log for data safety.

DroppedSnapshotException is correctly handled in MemStoreFlusher for internally triggered
flush (which are generated by flush-size / rollWriter / periodicFlusher). But user-triggered
flush is processed directly by HRegionServer->HRegion without putting a flush entry to
flushQueue, hence not handled by MemStoreFlusher)
        Summary: DroppedSnapshotException is not handled properly for flush triggered by hlog-replay
and non-abort region close  (was: Possible data loss due to non-handled DroppedSnapshotException
for user-triggered flush from client/shell)

> DroppedSnapshotException is not handled properly for flush triggered by hlog-replay and
non-abort region close
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10557
>                 URL: https://issues.apache.org/jira/browse/HBASE-10557
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>            Priority: Critical
>
> Flush triggered by hlog-replay(replayRecoveredEdits) and region-close(non-abort close)
get processed directly by region without putting flush entry into flushQueue, hence not handled
by MemStoreFlusher, So DroppedSnapshotException emitted from internalFlushcache is not handled
properly



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message