hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11099) Two situations where we could open a region with smaller sequence number
Date Thu, 01 May 2014 05:59:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986374#comment-13986374

stack commented on HBASE-11099:

[~jeffreyz] I don't think so. Down in append we do this:

    long sequence = this.disruptor.getRingBuffer().next();
    try {
      RingBufferTruck truck = this.disruptor.getRingBuffer().get(sequence);
      FSWALEntry entry =
        new FSWALEntry(sequence, logKey, edits, sequenceId, inMemstore, htd, info);
      truck.loadPayload(entry, scope.detach());
    } finally {

So we get a slot on the ring buffer and load it up.  When ready to go, we publish to the ring.

Threads contend here abouts so publishing can be happening in any order (that could be ok).

(Reading the setAvailable, called when we publish, I can't tell how it works w/o running some
tests; i.e. does publishing make it available for processing though there are sequences ahead
of this one not yet published?. I could do that.)

The ring buffer sequence number is an internal detail not related to region sequence id. Wouldn't
I have to relate them doing the above (ringbuffer is regionserver scoped)?  Otherwise, I would
have to synchronize -- i.e. block -- the disruptor so I could tie the disruptor id getting
and the upping of the region sequence id together?  Unless I used the disruptor id as region
sequence id? (would need to check that publish respected disruptor id).  Disruptor id is a
long.  Say 100k writes a second, I think its 3M years till we roll over (would have to check
-- disruptor might be using some of the higher order bits as flags).

Also at flush time, don't we want all that could be in the snapshot sync'd rather than just
appended?  I know sync is a pretty faint guarantee but it would be better than our using a
seqid of an edit not sync'd?  Thinking on it, this might not be necessay.  If the flush succeeds,
we probably had a sync come in in in the meantime.  Could do a sync outside of the update
lock to be sure.

What you think boss?  (thanks for the help here).

> Two situations where we could open a region with smaller sequence number
> ------------------------------------------------------------------------
>                 Key: HBASE-11099
>                 URL: https://issues.apache.org/jira/browse/HBASE-11099
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.99.0
>            Reporter: Jeffrey Zhong
>             Fix For: 0.99.0
> Recently I happened to run into code where we potentially could open region with smaller
sequence number:
> 1) Inside function: HRegion#internalFlushcache. This is due to we change the way WAL
Sync where we use late binding(assign sequence number right before wal sync).
> The flushSeqId may less than the change sequence number included in the flush which may
cause later region opening code to use a smaller than expected sequence number when we reopen
the region.
> {code}
> flushSeqId = this.sequenceId.incrementAndGet();
> ...
> mvcc.waitForRead(w);
> {code}
> 2) HRegion#replayRecoveredEdits where we have following code:
> {code}
> ...
>           if (coprocessorHost != null) {
>             status.setStatus("Running pre-WAL-restore hook in coprocessors");
>             if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, val)) {
>               // if bypass this log entry, ignore it ...
>               continue;
>             }
>           }
> ...
>           currentEditSeqId = key.getLogSeqNum();
> {code} 
> If coprocessor skip some tail WALEdits, then the function will return smaller currentEditSeqId.
In the end, a region may also open with a smaller sequence number. This may cause data loss
because Master may record a larger flushed sequence Id and some WALEdits maybe skipped during
recovery if the region fail again.

This message was sent by Atlassian JIRA

View raw message