hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7671) Flushing memstore again after last failure could cause data loss
Date Mon, 28 Jan 2013 05:31:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564052#comment-13564052
] 

chunhui shen commented on HBASE-7671:
-------------------------------------

Each flushed file has a sequence id, and all hlog entries which smaller than the max sequence
id will be skipped when replaying edit logs.

It means all data in memstore smaller than the sequence id has been flushed to storefile when
flushing, but broken by the case that snapshot called again without clearing previous because
of last failure.


bq.Could somehow store the correct sequenceId with the snapshot?
I have considered this solution, but it seems not convenient for multi stores. In that way
we should maintain sequence id for each memstore rather than one region.

Snapshot means creating a snapshot of the current memstore. If data belongs to memstore before
successfully flushed, IMO, copies any new KVs is reasonable.
                
> Flushing memstore again after last failure could cause data loss
> ----------------------------------------------------------------
>
>                 Key: HBASE-7671
>                 URL: https://issues.apache.org/jira/browse/HBASE-7671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.4
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: HBASE-7671.patch, HBASE-7671v2.patch, HBASE-7671v3.patch
>
>
> See the following logs first:
> {code}
> 2013-01-23 18:58:38,801 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=9746535080,
memsize=101.8m, into tmp file hdfs://dw77.kgb.sqa.cm4:9900/hbase-test3/writetest1/8dc14e35b4d7c0e481e0bb30849cff7d/.tmp/bebeeecc56364b6c8126cf1dc6782a25
> 2013-01-23 18:58:41,982 WARN org.apache.hadoop.hbase.regionserver.MemStore: Snapshot
called again without clearing previous. Doing nothing. Another ongoing flush or did we fail
last attempt?
> 2013-01-23 18:58:43,274 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=9746599334,
memsize=101.8m, into tmp file hdfs://dw77.kgb.sqa.cm4:9900/hbase-test3/writetest1/8dc14e35b4d7c0e481e0bb30849cff7d/.tmp/4eede32dc469480bb3d469aaff332313
> {code}
> The first time memstore flush is failed when commitFile()(Logged the first edit above),
then trigger server abort, but another flush is coming immediately(could caused by move/split,Logged
the third edit above) and successful.
> For the same memstore's snapshot, we get different sequenceid, it causes data loss when
replaying log edits
> See details from the unit test case in the patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message