hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ishan Chhabra <ichha...@rocketfuel.com>
Subject Re: Setting up NxN replication
Date Sat, 09 Nov 2013 03:36:09 GMT
Demai,

I see. That is a good suggestion to add redundancy, but doubles the network
traffic and also doubles the wal edits. Also, after HBASE-7709, HBase
stores a list of cluster-ids and this list will grow very fat in this case,
maybe making waledits heavy.

I am now inclined to implement what I described in the first post, but am
not sure if it would be useful upstream. Ill file a JIRA and see.

In any case, thanks for the wonderful discussion. Ill report back here on
what I did and if it worked.


On Fri, Nov 8, 2013 at 6:55 PM, Demai Ni <nidmgg@gmail.com> wrote:

> Ishan,
>
> "Coming to Demai’s suggestion of M-M to 2 instead of 9, i still want to
> have
> the data available from 1 to all clusters. How would I do it with your
> setup?".
>
> If I understand the requirement currently, your setup are almost here :
> C1 <-> C2 <-> C3 <-> C4  and *C4<->C1*
> Basically, a double-linked-list forming a cycle. In this way, no single
> point of failure, writes on any of the cluster will eventually be
> replicated to all the clusters. The good part is that for each write,
> although the total # of the writes are the same as NXN, each cluster will
> only need the handle at most 2. With this said, I never setup more than 3
> clusters, and have to assume no other bugs similar of HBASE-7709(loop in
> Master/Master Replication) coming out of this.
>
> Still, I don't have a good solution for '..a row should be present in only
> 4/10 clusters..". One approach will use more than one columnfamily, +
> either HBase-5002(control replication peer per column family) or
> HBase-8751. Unfortunately, neither of the jira has been resolved yet. my 2
> cents.
>
> Demai
>
>
> On Fri, Nov 8, 2013 at 4:38 PM, Ishan Chhabra <ichhabra@rocketfuel.com
> >wrote:
>
> > Demai, Ted:
> >
> > Thanks for the detailed answer.
> >
> > I should add some more context here. The underlying network is a NxN
> mesh.
> > The “cost" for each link is same.
> >
> > Coming to Demai’s suggestion of M-M to 2 instead of 9, i still want to
> have
> > the data available from 1 to all clusters. How would I do it with your
> > setup?
> >
> > For the difference between MST and NxN:
> > Consider the following example, with 4 clusters: C1, C2, C3, C4, and
> write
> > going to C1.
> >
> > In NxN mesh, the write will be propagated as:
> > C1 -> C2
> > C1 -> C3
> > C1 -> C4
> >
> > network cost: 3, writes to wal: 3
> >
> > MST with tree as C1 <-> C2 <-> C3 <-> C4, the write will be propagated
> as:
> > C1 -> C2
> > C2 -> C3
> > C3 -> C4
> >
> > network cost: 3, writes to wal: 3
> >
> > Both approaches have the same network and wal cost. The only difference
> is
> > that in MST, if C2 fails, writes from C1 will not go to C3 and C4, where
> as
> > in NxN case, the writes will still happen.
> >
> > Also, (1) and (3) are not an issue for us.
> >
> > Having said that, I do realize that adding more clusters is increasing
> the
> > load quadratically, and that does worry me. Our actual use case is that a
> > row should be present in only 4/10 clusters, but it varies based on the
> row
> > and not on the cluster. So I cannot come up with a static replication
> > configuration that will handle that. I am looking into per row
> replication,
> > but will start that a separate discussion and share my ideas there.
> >
> > I hope this makes more sense now.
> >
> >
> > On Fri, Nov 8, 2013 at 3:47 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > bq. how about your company have a new office in the 11th locations?
> > >
> > > With minimum spanning tree approach, the increase in load wouldn't be
> > > exponential.
> > >
> > >
> > > On Fri, Nov 8, 2013 at 2:58 PM, Demai Ni <nidmgg@gmail.com> wrote:
> > >
> > > > Ishan,
> > > >
> > > > have to admit that I am a bit surprise about the need of have data
> > center
> > > > in 10 different locations. Well, I guess I shouldn't be, as every
> > company
> > > > is global now(anyone from Mars yet?)
> > > >
> > > > In your case, since there is only one column family. The headache is
> > not
> > > as
> > > > bad. Let's call your clusters as C1, C2, ... C10
> > > >
> > > > The safest way for your most critical data is still have setup the
> M-M
> > > > replication by 1 to N-1. That is every cluster add the rest of
> clusters
> > > as
> > > > its peer. For example C1 will have C2, C3...C10 as its peers; C2 will
> > > have
> > > > C1, C3.. C10. Well, that will be a lot of data over the network.
> > Although
> > > > it is the best/fast way to get all the cluster sync-up. I don't like
> > the
> > > > idea at all(too expensive for one).
> > > >
> > > > Now, let's improve it a bit. C1 will setup M-M to 2 of the rest 9,
> and
> > > > carefully planned the distribution so that all the clusters will get
> > > equal
> > > > load. Well, a system administrator has to do it manually.
> > > >
> > > > Now, thinking about the headache:
> > > > 1) what if your company(that is your manager who has no idea how
> > > difficult
> > > > it is) decide to have one more column family to be replicated?  how
> > about
> > > > two more? The load will grow exponentially
> > > > 2) how about your company have a new office in the 11th locations?
> > again,
> > > > grow exponentially
> > > > 3) let's say you are the best administrator, and keep nice record of
> > > > everything (unforturnatly, Hbase alone doesn't have a good way to
> > > maintain
> > > > all the record of who is being replicated). And then, the admin left
> > the
> > > > company? or this is a global company has 10 admin at different
> > locations.
> > > > How do they communicate of the replication setup?
> > > >
> > > > :-) Well, the 3) is not too bad. I just like to point it out as it
> can
> > be
> > > > quite true for a company large enough to have 10 locations
> > > >
> > > > Demai
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Nov 8, 2013 at 2:42 PM, Ishan Chhabra <
> ichhabra@rocketfuel.com
> > > > >wrote:
> > > >
> > > > > Ted:
> > > > > Yes. It is the same table that is being written to from all
> > locations.
> > > A
> > > > > single row could be updated from multiple locations, but our schema
> > is
> > > > > designed in a manner that writes will be independent and not
> clobber
> > > each
> > > > > other.
> > > > >
> > > > >
> > > > > On Fri, Nov 8, 2013 at 2:33 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > > > Ishan:
> > > > > > In your use case, the same table is written to in 10 clusters
at
> > > > roughly
> > > > > > the same time ?
> > > > > >
> > > > > > Please clarify.
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 8, 2013 at 2:29 PM, Ishan Chhabra <
> > > ichhabra@rocketfuel.com
> > > > > > >wrote:
> > > > > >
> > > > > > > @Demai,
> > > > > > > We actually have 10 clusters in different locations.
> > > > > > > The replication scope is not an issue for me since I have
only
> > one
> > > > > column
> > > > > > > family and we want it replicated to each location.
> > > > > > > Can you elaborate more on why a replication setup of more
than
> > 3-4
> > > > > > clusters
> > > > > > > would be a headache in your opinion?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 8, 2013 at 2:16 PM, Ishan Chhabra <
> > > > ichhabra@rocketfuel.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > @Demai,
> > > > > > > > Writes from B should also go to A and C. So, if I
were to
> > > continue
> > > > on
> > > > > > > your
> > > > > > > > suggestion, I would setup A-B master master and B-C
> > > master-master,
> > > > > > which
> > > > > > > is
> > > > > > > > what I was proposing in the 2nd approach (MST based).
> > > > > > > >
> > > > > > > > @Vladimir
> > > > > > > > That is classified. :P
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 8, 2013 at 1:20 PM, Vladimir Rodionov
<
> > > > > > > vladrodionov@gmail.com>wrote:
> > > > > > > >
> > > > > > > >> *I want to setup NxN replication i.e. N clusters
each
> > > replicating
> > > > to
> > > > > > > each
> > > > > > > >> other. N is expected to be around 10.*
> > > > > > > >>
> > > > > > > >> Preparing to thermonuclear war?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Nov 8, 2013 at 1:14 PM, Ishan Chhabra
<
> > > > > > ichhabra@rocketfuel.com
> > > > > > > >> >wrote:
> > > > > > > >>
> > > > > > > >> > I want to setup NxN replication i.e. N clusters
each
> > > replicating
> > > > > to
> > > > > > > each
> > > > > > > >> > other. N is expected to be around 10.
> > > > > > > >> >
> > > > > > > >> > On doing some research, I realize it is possible
after
> > > > HBASE-7709
> > > > > > fix,
> > > > > > > >> but
> > > > > > > >> > it would lead to much more data flowing in
the system. eg.
> > > > > > > >> >
> > > > > > > >> > Lets say we have 3 clusters: A,B and C.
> > > > > > > >> > A new write to A will go to B and then C,
and also go to C
> > > > > directly
> > > > > > > via
> > > > > > > >> the
> > > > > > > >> > direct path. This leads to unnecessary network
usage and
> > > writes
> > > > to
> > > > > > WAL
> > > > > > > >> of
> > > > > > > >> > B, that should be avoided. Now imagine this
with 10
> > clusters,
> > > it
> > > > > > won’t
> > > > > > > >> > scale.
> > > > > > > >> >
> > > > > > > >> > One option is to create a minimum spanning
tree joining
> all
> > > the
> > > > > > > clusters
> > > > > > > >> > and make nodes replicate to their immediate
peers in a
> > > > > master-master
> > > > > > > >> > fashion. This is much better than NxN mesh,
but still has
> > > extra
> > > > > > > network
> > > > > > > >> and
> > > > > > > >> > WAL usage. It also suffers from a failure
scenarios where
> > the
> > > a
> > > > > > single
> > > > > > > >> > cluster going down will pause replication
to clusters
> > > > downstream.
> > > > > > > >> >
> > > > > > > >> > What I really want is that the ReplicationSource
should
> only
> > > > > forward
> > > > > > > >> > WALEdits with cluster-id same as the local
cluster-id.
> This
> > > > seems
> > > > > > > like a
> > > > > > > >> > straight forward patch to put in.
> > > > > > > >> >
> > > > > > > >> > Any thoughts on the suggested approach or
alternatives?
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > *Ishan Chhabra *| Rocket Scientist | RocketFuel
Inc.
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> >
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message