hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4645) Edits Log recovery losing data across column families
Date Fri, 21 Oct 2011 21:56:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133105#comment-13133105
] 

jiraposter@reviews.apache.org commented on HBASE-4645:
------------------------------------------------------



bq.  On 2011-10-21 18:31:16, Lars Hofhansl wrote:
bq.  > Wow... Losing data?! And that went unnoticed for so long.
bq.  > I guess I don't understand when the store's maxid go out of sync and why this does
not happen all the time.
bq.  > 
bq.  > Nice find!!

currently, we seem to flush all the column families together. 

For this failure scenario to kick in, there has to be a failure after *some* stores have flushed.
But, not
all of them.

Unclean shutdowns are rare. That too in the middle of the flushing wasn't too common.

But, important to fix, given that we now know about it.


bq.  On 2011-10-21 18:31:16, Lars Hofhansl wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java, line
289
bq.  > <https://reviews.apache.org/r/2524/diff/4/?file=52155#file52155line289>
bq.  >
bq.  >     Is the because before your change it would not need to replay any logs, but
now it does?

yes. earlier we would not have replayed any logs. The test was ensuring that we don't.

Now that we do our decision based on the minimum across different Stores, we do end
up replaying some edits.


- Amitanand


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2524/#review2755
-----------------------------------------------------------


On 2011-10-21 21:46:51, Amitanand Aiyer wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2524/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-21 21:46:51)
bq.  
bq.  
bq.  Review request for Ted Yu, Michael Stack, Jonathan Gray, Lars Hofhansl, Amitanand Aiyer,
Kannan Muthukkaruppan, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is a data loss happening (for some of the column families) when we do the replay
logs.
bq.  
bq.  The bug seems to be from the fact that during replay-logs we only choose to replay
bq.  the logs from the maximumSequenceID across ALL the stores. This is wrong. If a
bq.  column family is ahead of others (because the crash happened before all the column
bq.  families were flushed), then we lose data for the column families that have not yet
bq.  caught up.
bq.  
bq.  The correct logic for replay should begin the replay from the minimum across the
bq.  maximum in each store.
bq.  
bq.  
bq.  This addresses bug hbase-4645.
bq.      https://issues.apache.org/jira/browse/hbase-4645
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8c32839 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 966262b 
bq.  
bq.  Diff: https://reviews.apache.org/r/2524/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Initial patch. v1.
bq.  
bq.  mvn test (running).
bq.  
bq.  TBD: add a test case to repro the issue and make sure it fixes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Amitanand
bq.  
bq.


                
> Edits Log recovery losing data across column families
> -----------------------------------------------------
>
>                 Key: HBASE-4645
>                 URL: https://issues.apache.org/jira/browse/HBASE-4645
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100924, 0.92.0
>            Reporter: Amitanand Aiyer
>            Assignee: Amitanand Aiyer
>
> There is a data loss happening (for some of the column families) when we do the replay
logs.
> The bug seems to be from the fact that during replay-logs we only choose to replay
> the logs from the maximumSequenceID across *ALL* the stores. This is wrong. If a
> column family is ahead of others (because the crash happened before all the column
> families were flushed), then we lose data for the column families that have not yet
> caught up.
> The correct logic for replay should begin the replay from the minimum across the
> maximum in each store. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message