hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8741) Mutations on Regions in recovery mode might have same sequenceIDs
Date Wed, 19 Jun 2013 18:36:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688294#comment-13688294

Himanshu Vashishtha commented on HBASE-8741:

Thanks for sharing the approach Sergey.

One con about the above approach is a possibility to block up the writes, and that too, in
a write heavy app.

Re: bumping up sequenceID scheme: Stack raised a valid concern about opening a region in the
last WAL file and it might have exceptionally high sequence number. Let's say such a region
is 'Rg'.

I was thinking of the following approach:

Based on the WAL file size, we can determine the maximum number of Edits a WAL file can have.
Let's say it is X.

a) There is a znode per rs: /hbase/sequenceId/rs1[]. It is updated whenever a region is opened
AND we find that we need to bump up the log sequenceId because the region has larger sequenceId
in its HFiles than the current regionserver Log sequenceId. Let's say it reads value 'SqN2'.

Now, when processing a regionserver failover:
a) Read the trailer of the last completed WAL file to know the sequenceId at the time the
last log was rolled. Let's say the sequenceId is 'SqN1'.

b) While opening the regions of the failed rs in SSH, we read both 'SqN1', and 'SqN2'. If
'SqN1' > 'SqN2', then we are sure that no region 'Rg' was opened in the last WAL. Otherwise,
we use SqN2 in step c.

c) We would hint the new regionserver while opening regions of the dead regionserver (these
regions will carry this info in HRegionInfo) to use sequenceNumber = SqN + nX, where 'n' is
the number of incompleted WAL files (which don't have trailers). In current case, it is 1.
If we have multiWAL, we would use number of WALs we support.  

1) No blocking of writes. We are adding logic/processing only in recovery path.
2) Not reading WALs multiple times
3) Multi WAL could be supported.

1) Extra zk call. But this will be called _only_ when we are bumping the sequenceID of the
2) One znode per rs. This would be clean up when master is done processing of the dead regionserver.

I think with this, we could allay our concerns about sequenceId collision in the new regionserver,
and regions can be mark available for writes without waiting for distributed log replay/splitting
to finish.

Please let me know what you think of this. Thanks.
> Mutations on Regions in recovery mode might have same sequenceIDs
> -----------------------------------------------------------------
>                 Key: HBASE-8741
>                 URL: https://issues.apache.org/jira/browse/HBASE-8741
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>    Affects Versions: 0.95.1
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
> Currently, when opening a region, we find the maximum sequence ID from all its HFiles
and then set the LogSequenceId of the log (in case the later is at a small value). This works
good in recovered.edits case as we are not writing to the region until we have replayed all
of its previous edits. 
> With distributed log replay, if we want to enable writes while a region is under recovery,
we need to make sure that the logSequenceId > maximum logSequenceId of the old regionserver.
Otherwise, we might have a situation where new edits have same (or smaller) sequenceIds. 
> We can store region level information in the WALTrailer, than this scenario could be
avoided by:
> a) reading the trailer of the "last completed" file, i.e., last wal file which has a
trailer and,
> b) completely reading the last wal file (this file would not have the trailer, so it
needs to be read completely).
> In future, if we switch to multi wal file, we could read the trailer for all completed
WAL files, and reading the remaining incomplete files.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message