Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 7 Jun 2013 23:00:21 +0000 (UTC)
From: "Elliott Clark (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12651415.1370543927825.88090.1370646021882@arcas>
In-Reply-To: <JIRA.12651415.1370543927825@arcas>
References: <JIRA.12651415.1370543927825@arcas>
Subject: [jira] [Commented] (HBASE-8701) distributedLogReplay need to apply
 wal edits in the receiving order of those edits
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678551#comment-13678551 ] 

Elliott Clark commented on HBASE-8701:
--------------------------------------

bq.It seems to me that we have edits with the same timestamps in different WAL files this can only happen when the client explicitly set the timestamps.

Until we have a full Multi-Wal implementation, which is something that's definitely planned. 
                
> distributedLogReplay need to apply wal edits in the receiving order of those edits
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8701
>                 URL: https://issues.apache.org/jira/browse/HBASE-8701
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.95.2
>
>
> This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key
> h5. The original concern situation raised from [~eclark]:
> For all edits the rowkey is the same.
> There's a log with: [ A (ts = 0), B (ts = 0) ]
> Replay the first half of the log.
> A user puts in C (ts = 0)
> Memstore has to flush
> A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
> Replay the rest of the Log.
> Flush
> The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2
> h5. Below is the option I'd like to use:
> a) During replay, we pass wal file name hash in each replay batch and original wal sequence id of each edit to the receiving RS
> b) Once a wal is recovered, playing RS send a signal to the receiving RS so the receiving RS can flush
> c) In receiving RS, different WAL file of a region sends edits to different memstores.(We can visualize this in high level as sending changes to a new region object with name(origin region name + wal name hash) and use the original sequence Ids.) 
> d) writes from normal traffic(allow writes during recovery) are put in normal memstores as of today and flush normally with new sequenceIds.
> h5. The other alternative options are listed below for references:
> Option one
> a) disallow writes during recovery
> b) during replay, we pass original wal sequence ids
> c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins.
> Option two
> a) During replay, we pass original wal sequence ids
> b) for each wal edit, we store each edit's original sequence id along with its key. 
> c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id
> d) compaction can just leave put with max sequence id
> Please let me know if you have better ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira