hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6617) ReplicationSourceManager should be able to track multiple WAL paths
Date Wed, 26 Aug 2015 16:38:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714485#comment-14714485
] 

Yu Li commented on HBASE-6617:
------------------------------

Hi [~zjushch],

Thanks for the review.

I've considered your point carefully, but I still think one replication source per wal group
is a better way, for below reasons:

1. w.r.t semantic of ReplicationSource, I believe it's "many-one" rather than "one-one" relationship
between source and peer. One replication source stands for one kind of source, and no matter
how many kinds of source, we need to replicate them all to the specified peer. Before multi
wal it's a special case that there's only one kind of source. Just think about the heterogeneous
storage implementation in HDFS, after supporting different kinds of disks, the block report
granularity has changed from node-level to disk-level. I think multiple wal is quite similar
to that.

2. w.r.t business point of view, one wal group may stand for one business. In our scenario
we created a grouping strategy based on namespace which allows regions of the same business
writing into the same log group. In this case one source per group could allow us to know
the replication latency of each business, per regionserver/cluster level. 

3. w.r.t deleting ReplicationSource instance, you could find the logic in ReplicationSourceManager#removePeer,
where the source would be terminated first and then removed from the source list.

4. w.r.t source metrics, we will use "peerId@groupId" as the id, and when reporting, the metrics
name would be like "source.<peerId@groupId>.ageOfLastShippedOp", you can find the whole
logic in constructor of MetricsSource. If you'd still prefer to have a metrics collection
to track like "per regionserver level latency to one peer", we could add a "MetricsReplicationPeerSourceSource"
similar to MetricsReplicationGlobalSourceSource, when using strategy like randomly bounded
region group.

Feel free to let me know your thoughts.

> ReplicationSourceManager should be able to track multiple WAL paths
> -------------------------------------------------------------------
>
>                 Key: HBASE-6617
>                 URL: https://issues.apache.org/jira/browse/HBASE-6617
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Ted Yu
>            Assignee: Yu Li
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: HBASE-6617.patch, HBASE-6617_v2.patch, HBASE-6617_v3.patch
>
>
> Currently ReplicationSourceManager uses logRolled() to receive notification about new
HLog and remembers it in latestPath.
> When region server has multiple WAL support, we need to keep track of multiple Path's
in ReplicationSourceManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message