accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Understanding SortedLogRecovery
Date Tue, 13 May 2014 01:49:43 GMT
I have a few questions on about the members on LogFileKey and how 
they're used to implement (sorted) log recovery. My purpose in asking is 
related to extracting LogEntryKey/LogEntryValue pairs with 
LogEvents.MUTATION and LogEvents.MANY_MUTATIONS for a specific table 

I'm reading through the SortedLogRecovery class, specifically the 
findLastStartToFinish and playbackMutations methods. The confusing 
members to me right now are the `tid` and `seq` members which are a part 
of every LogFileKey (meaning, they are included regardless of the value 
of the LogEvents member).

Looking at SortedLogRecover.playbackMutations, it appears that if I know 
what extent matches with a `tid` (from a DEFINE_TABLET LogEvents), any 
mutations (MUTATION or MANY_MUTATIONS) with that `tid` are for that 
extent. Is that correct?

Assuming that's the case (as it appears from the code), when dealing 
with a WAL which is not yet sorted (what I'm trying to do with 
replication), I can still use that `tid` to know which mutations are 
associated with that extent? Is it possible to see a new DEFINE_TABLE 
LogFileKey come through with more mutations using the old `tid` (in 
other words, is it possible that two `tids` with mutations for the same 
extent could be interspersed in an unsorted WAL)?

View raw message