hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
Date Fri, 07 Jun 2013 21:50:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678475#comment-13678475

stack commented on HBASE-8701:

Elliott and I had a chat and came up w/ something similar to the Enis proposal.  Let me write
it out.

# Add new API to RS called replay.  Method would take a WALEdit or even a list of WALEdits
which I think has seqid and target region in it.
# Replay all WAL edits into an in-memory recovered.edits file: i.e. a datastructure that orders
edits by sequenceid.  If it has to spill because of memory pressure, that is ok; we would
write out this ordered-by-id "recovered.edits" file.
# On notification that all WALs have been replayed, play the ordered-by-id "recovered.edits"
into a memstore that is NOT the per-column-family memstore we use in normal operation (If
we flushed 'recovered.edits', they will need to be merged w/ what we have in memory doing
the replay).  This 'other' memstore we call the replay-memstore (RMS).  The replay skips the
WAL.  If the RMS has to flush during replay, that is fine.  The 'replayer' will provide the
seqid to write the hfile out with (this could be messy).  The seqid will be one gotten from
the WALEdit, not from currently hosting RS.
# Flush out the RMS using the seqid of the last edit.
# Flip the region to take reads.

Elliott then went on to remark that what we want is mapreduce; our task would group-by-region,
then sort by sequenceid, and finally write out an hfile (with retries and recovery, etc.)

I like Enis's putting off sort till last moment.

Chatting more w/ Elliott, if we let go of the requirement that we respect insert order when
two edits have same coordinates, it certainly would make a lot of ops easier (this replay,
compactions, etc.).  Was suggested too that this would be a good topic for the hackathon on
weds if you fellas are going to come: http://www.meetup.com/hackathon/events/123403802/
> distributedLogReplay need to apply wal edits in the receiving order of those edits
> ----------------------------------------------------------------------------------
>                 Key: HBASE-8701
>                 URL: https://issues.apache.org/jira/browse/HBASE-8701
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.95.2
> This issue happens in distributedLogReplay mode when recovering multiple puts of the
same key + version(timestamp). After replay, the value is nondeterministic of the key
> h5. The original concern situation raised from [~eclark]:
> For all edits the rowkey is the same.
> There's a log with: [ A (ts = 0), B (ts = 0) ]
> Replay the first half of the log.
> A user puts in C (ts = 0)
> Memstore has to flush
> A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
> Replay the rest of the Log.
> Flush
> The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T)
in WAL2
> h5. Below is the option I'd like to use:
> a) During replay, we pass wal file name hash in each replay batch and original wal sequence
id of each edit to the receiving RS
> b) Once a wal is recovered, playing RS send a signal to the receiving RS so the receiving
RS can flush
> c) In receiving RS, different WAL file of a region sends edits to different memstores.(We
can visualize this in high level as sending changes to a new region object with name(origin
region name + wal name hash) and use the original sequence Ids.) 
> d) writes from normal traffic(allow writes during recovery) are put in normal memstores
as of today and flush normally with new sequenceIds.
> h5. The other alternative options are listed below for references:
> Option one
> a) disallow writes during recovery
> b) during replay, we pass original wal sequence ids
> c) hold flush till all wals of a recovering region are replayed. Memstore should hold
because we only recover unflushed wal edits. For edits with same key + version, whichever
with larger sequence Id wins.
> Option two
> a) During replay, we pass original wal sequence ids
> b) for each wal edit, we store each edit's original sequence id along with its key. 
> c) during scanning, we use the original sequence id if it's present otherwise its store
file sequence Id
> d) compaction can just leave put with max sequence id
> Please let me know if you have better ideas.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message