accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1177) Decrease time it takes to recover after tablet server failures
Date Mon, 25 Mar 2013 18:43:16 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612967#comment-13612967
] 

Keith Turner commented on ACCUMULO-1177:
----------------------------------------

I was thinking about the design for all of the 1.6 walog changes.   Currently each batch of
mutations thats written to a walog has a sequence number attached to it.  This sequence number
relates the mutations to an instance of an in memory map.  Currently this sequence number
is recorded in start and stop minor compaction events in the walog.   I'm thinking we can
possibly dispense with these start and stop minc event in the walog and instead store the
seq# in the metadata table.  This could be done with the mutation that writes out a new minor
compaction file to the metadata table to make it atomic.  It could be stored in a new column.
 It would need to increase monotonically for the lifetime of a tablet.

I think this will have two benefits.

 * For ACCUMULO-1083, I think this will greatly simplify minor compactions and recovery. 
I think it avoids having to group logs and consider each group in order at recovery time.
 We would not need to write start and stop minc events to all active walog groups (even if
no mutations  were written to the current walog of the group).  
 * For this issue, I think it leads to the possibility of sorting less data at recovery time.
 We can analyze the metadata table and determine for each walog what (tablet, seq #) pairs
are needed.  Then we would only need to sort mutations where the (tablet, seq #) is > whats
needed.

The drawback of this approach is that the walog will be less self contained.   


                
> Decrease time it takes to recover after tablet server failures
> --------------------------------------------------------------
>
>                 Key: ACCUMULO-1177
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1177
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>             Fix For: 1.6.0
>
>
> Examine the end-to-end process for recovering from failures and look for ways to speed
it up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message