hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
Date Mon, 10 Jun 2013 17:54:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679684#comment-13679684
] 

Sergey Shelukhin commented on HBASE-8701:
-----------------------------------------

bq. A MultiStoreFile would have edits from one WAL file for all regions in the WAL? Which
region would it live in and how would it get cleaned up? (When all references had been dropped?)
We'd have to write 'reference' files into each region that pointed back to a range on this
WAL? Wouldn't we be making near as many NN operations as for the case where we wrote out an
hfile per region?
It will have to use some form of links and references, and will have edits from multiple stores.
Something like that is described in BigTable paper actually :)
bq. I think this multistorefile notion too complex.
bq. We could keep hfiles per region in memory and not write them until we had too but then
we lose the incremental benefit and we start to arrive at the Enis/Elliott scheme?
Why more complex? Should be pretty simple, we already have something like that.
Incremental benefit is auxiliary; main benefit in my view is precisely that we don't have
additional in-memory/file structures/separate memstores in the same store that need to be
reconciled/communication channels.

Agree on having seqIds for each KV... it will increase file size which is not a huge deal
probably, but will solve many problems. Should we do it now while we are still before singularity?


                
> distributedLogReplay need to apply wal edits in the receiving order of those edits
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8701
>                 URL: https://issues.apache.org/jira/browse/HBASE-8701
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.95.2
>
>
> This issue happens in distributedLogReplay mode when recovering multiple puts of the
same key + version(timestamp). After replay, the value is nondeterministic of the key
> h5. The original concern situation raised from [~eclark]:
> For all edits the rowkey is the same.
> There's a log with: [ A (ts = 0), B (ts = 0) ]
> Replay the first half of the log.
> A user puts in C (ts = 0)
> Memstore has to flush
> A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
> Replay the rest of the Log.
> Flush
> The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T)
in WAL2
> h5. Below is the option I'd like to use:
> a) During replay, we pass wal file name hash in each replay batch and original wal sequence
id of each edit to the receiving RS
> b) Once a wal is recovered, playing RS send a signal to the receiving RS so the receiving
RS can flush
> c) In receiving RS, different WAL file of a region sends edits to different memstores.(We
can visualize this in high level as sending changes to a new region object with name(origin
region name + wal name hash) and use the original sequence Ids.) 
> d) writes from normal traffic(allow writes during recovery) are put in normal memstores
as of today and flush normally with new sequenceIds.
> h5. The other alternative options are listed below for references:
> Option one
> a) disallow writes during recovery
> b) during replay, we pass original wal sequence ids
> c) hold flush till all wals of a recovering region are replayed. Memstore should hold
because we only recover unflushed wal edits. For edits with same key + version, whichever
with larger sequence Id wins.
> Option two
> a) During replay, we pass original wal sequence ids
> b) for each wal edit, we store each edit's original sequence id along with its key. 
> c) during scanning, we use the original sequence id if it's present otherwise its store
file sequence Id
> d) compaction can just leave put with max sequence id
> Please let me know if you have better ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message