hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9465) Push entries to peer clusters serially
Date Thu, 29 Dec 2016 06:05:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784611#comment-15784611

Phil Yang commented on HBASE-9465:

Thanks [~stack] for your comment.

bq. The latter keeps only the latest opening? Could it not have been amended to keep all?
info:seqnumDuringOpen saves sequence id in its value so it can only saves one value. But for
replication we need all sequence id for each time a region being opened. For compatibility
I didn't want to change this column so I create a new one. And a independent family(rep_barrier)
can prevent read too much data when we only want to read info family.

bq. Do we have the position in two places now? Do we need to update zk if we've updated meta?
We save position for each WAL files in ZK(of course we have HBASE-15867 now). For serial replication
we save position for each region in meta. Two different positions.

bq. Because? It is not continuous against a peer? Seqid is 'continuous' within a region?
If I am not wrong, openSeq is the max sequence +1 , and the first log's sequence after opening
is openSeq+1, so in fact we will not have a log in WAL whose seq is openSeq.

bq. Why the -1 in the above? Because we add 1 when we open a region?

bq. We need to write this assumption into the code around where splits bring up daughters
on same RS as parent. This policy could change (Y! have put up a patch to make it so splits
do not bring up daughters on same servers as parent region).
Yes, the doc is old now. We have a special logic for split/merge in master branch now.

bq. This is another assumption of the design that needs to be marked in code so when this
changes, we'll accommodate the fix here.
OK. And in fact I have a plan to improve this. We can use only one thread to read the WAL
for non-recovery sources to reduce I/O pressure and should have some logic when one of peers
being blocked, will file a issue when I am going to do this.

bq. We do not write to the hbase:meta state of WALs, unless REPLICATION_SCOPE is set to 2?

bq. Can you say more on this?
WAL is server-level and replication source is peer level. So if in a peer a region's log can
not be pushed because of serial replication. All logs for this peer after this log are also
blocked. To prevent this we have to split these tables/cfs into different peers.

bq. When would an Entry not be ready to push?
When the region this entry belongs to has some logs whose seq is smaller than this entry and
they are not pushed to peer cluster, this entry can not be pushed.

bq. Do we have an idea of how much extra load this fix puts on hbase:meta?
Puts to rep_barrier and rep_meta are in batch mutation when region opens, so I think it is
not a big extra load. rep_position is updated frequently whose QPS is same as positions logging
on ZK. They only happen when some families enable this feature.

bq. How do we get insight on say delay that is happening because another RS's thread is (we
think) replaying a WAL?
As long as we think this log can not be pushed, normally something is delayed, maybe failover
or region moved. But we can not know the reason, we can only wait the work done. Unless there
is a bug finally they can be pushed.

These days I am working on this to enable this feature in our production cluster, maybe there
will be something need to improved. Hope I can say this feature is stable when 1.4 release

> Push entries to peer clusters serially
> --------------------------------------
>                 Key: HBASE-9465
>                 URL: https://issues.apache.org/jira/browse/HBASE-9465
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Replication
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Honghua Feng
>            Assignee: Phil Yang
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch,
HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, HBASE-9465-branch-1-v4.patch,
HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch,
HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465-v7.patch, HBASE-9465-v7.patch,
> When region-move or RS failure occurs in master cluster, the hlog entries that are not
pushed before region-move or RS-failure will be pushed by original RS(for region move) or
another RS which takes over the remained hlog of dead RS(for RS failure), and the new entries
for the same region(s) will be pushed by the RS which now serves the region(s), but they push
the hlog entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different replication-source threads
to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and major-compact occurs
in peer cluster before put is pushed to peer cluster, the delete is collected and the put
remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the put is masked
by the delete, hence data inconsistency between master and peer clusters

This message was sent by Atlassian JIRA

View raw message