hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kahlil Oppenheimer <kahliloppenhei...@gmail.com>
Subject HBase Replication vs Read Replicas
Date Tue, 10 Oct 2017 14:55:29 GMT
Hi All,

I have some questions about when to use HBase Replication vs. HBase Read
Replicas. They seem to accomplish similar-ish things, and I'm trying to
figure out which I should use.

I've read through the documentation, but I am confused on a few points. It
seems that HBase Replication can have very high latency for replication (on
a magnitude of minutes). My application can tolerate a rough maximum of 60s
of replication latency, so that would be problematic for me.

Read Replicas seem to have quite low (configurable) replication latency,
but do not seem to lend themselves cross-datacenter replication. For
instance, having Replica 1 in Datacenter A and Replica 2 in Datacenter B,
allowing clients to say "Read only from Datacenter A" vs. "Read only from
Datacenter B".

My use case is that I have a table I'd like to replicate between data
centers A and B. It is OK if all writes can only go through one data center
(say, A). However, all clients should be able to read from either A or B.
In particular, I'd like for some clients to be able to specifically say
they'd like to read from A and others to say they'd like to read from B,
for any given row key.

It is also OK if the data coming from one of these reads can be stale, so
long as it is no more than 60s stale, and that the client has some
indication that the data may not be up to date.

Because of the 60s stale constraint, it seems like HBase Replication may
not be the proper tool to use here since it appears to have higher
replication latency and be more catered towards Disaster Recovery than High
Availability.

Read Replicas seem like the proper solution here, but the Timeline
consistency model doesn't seem to let me say "Read from datacenter B", it
just says "Try to read from all data-centers and return B if it gets back
first". Furthermore, it doesn't seem intuitive to force the region replicas
to be hosted on datacenter B.

What would you all recommend? Am I misunderstanding either of these HBase
features, or is there a more intuitive feature of HBase I should reference
to solve this problem?

For what it's worth, I'm running the CDH-5.9-1.2.0 version of HBase.

Many thanks,
Kahlil

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message