hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yunfan Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-10466) Wrong calculation of total memstore size in HRegion which could cause data loss
Date Tue, 04 Feb 2014 22:04:12 GMT
Yunfan Zhong created HBASE-10466:
------------------------------------

             Summary: Wrong calculation of total memstore size in HRegion which could cause
data loss
                 Key: HBASE-10466
                 URL: https://issues.apache.org/jira/browse/HBASE-10466
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.89-fb
            Reporter: Yunfan Zhong
            Priority: Critical
             Fix For: 0.89-fb


When there are failed flushes, data to be flush are kept in each MemStore's snapshot. Next
flush attempt will continue on snapshot first. However, the counter of total memstore size
in HRegion is always deduced by the sum of current memstore sizes after the flush succeeds.
This calculation is definitely wrong if flush fails last time.
When the server is shutting down, there are two flushes. During the missing KV issue period,
the first flush successfully saved data in snapshot. But the size counter was reduced to 0
or even less. This prevented the second flush since HRegion.internalFlushcache() directly
returns while total memstore size is not greater than 0. As result, data in memstores were
lost.
It could cause mass data loss up to the size limit of each memstore. For example, a region
had 516.3M data (size limit is 516M) in memstore at the moment because of failing flushes
for more than one hour. After the region was closed, these KVs were missing from HFiles but
exist in HLog.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message