hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: [DISCUSS] No regions on Master node in 2.0
Date Fri, 08 Apr 2016 15:43:53 GMT
I think proc-v2 make things easier than having meta hard coded on master.
we just read the wal and we get back to the state we were previously.
In this case it doesn't make any difference if meta is on master or remote,
if we have one or we have hundred.

if we hard code meta, we need a special logic to load it and from there
start the bootstrap of the other regions.
then there is no way to switch to multiple metas if someone wants that,
unless we keep two code path and one of that will be proc-v2.
so at that point we should just keep a single code path that does both.


On Fri, Apr 8, 2016 at 8:27 AM, Jimmy Xiang <jxcn01@gmail.com> wrote:

> One thing I'd like to say is that it makes the master startup much
> more simpler and realiable to put system tables on master.
>
> Even if proc-v2 can solve the problem, it makes things complicated,
> right? I prefer to be sure that meta is always available, in a
> consistent state.
>
> If we really need to split meta, we should have an option for most
> users to have just one meta region, and keep it on master.
>
>
> On Fri, Apr 8, 2016 at 8:03 AM, Matteo Bertozzi <theo.bertozzi@gmail.com>
> wrote:
> > # Without meta on master, we double assign and lose data.
> >
> > I doubt meta on master solve this problem.
> > This has more to do on the fact that balancer, assignment, split, merge
> > are disjoint operations that are not aware of each other.
> > also those operation in general consist of multiple steps and if the
> master
> > crashes you may end up in an inconsistent state.
> >
> > this is what proc-v2 should solve. since we are aware of each operation
> > there is no chance of double assignment and similar by design.
> >
> > The master doesn't need the full meta to operate properly
> > it just need the "state" (at which point of the operation am I).
> > which is the wal of proc-v2. given that we can split meta or meta
> > remote without any problem. since we only have 1 update to meta to
> > update the location when the assignment is completed.
> >
> > also at the moment the master has a copy of the information in meta.
> > a map with the RegionInfo, state and locations. but we are still doing
> > a query on meta instead of using that local map directly.
> > if we move meta on master we can remove that extra copy, but that
> > will tight together meta and master making impossible to offload meta, if
> > we need to.
> >
> >
> > In my opinion with the new assignment you have all the main problem
> solved.
> > we can keep regions on master as we have now,
> > so you can configure it to get more performance (avoid the remote rpc).
> > but our design should allow meta to be split and to be hosted somewhere
> > else.
> >
> > Matteo
> >
> >
> > On Fri, Apr 8, 2016 at 2:08 AM, 张铎 <palomino219@gmail.com> wrote:
> >
> >> Agree on the performance concerns. IMO we should not hurt the
> performance
> >> of small(maybe normal?) clusters when scaling for huge clusters.
> >> And I also agree that the current implementation which allows Master to
> >> carry system regions is not good(sorry for the chinglish...). At least,
> it
> >> makes the master startup really complicated.
> >>
> >> So IMO, we should let the master process or master machine to also carry
> >> system regions, but in another way. Start another RS instance on the
> same
> >> machine or in the same JVM? Or build a new storage based on the
> procedure
> >> store and convert it to a normal table when it is too large?
> >>
> >> Thanks.
> >>
> >> 2016-04-08 16:42 GMT+08:00 Elliott Clark <eclark@apache.org>:
> >>
> >> > # Without meta on master, we double assign and lose data.
> >> >
> >> > That is currently a fact that I have seen over and over on multiple
> >> loaded
> >> > clusters. Some abstract clean up of deployment vs losing data is a
> >> > no-brainer for me. Master assignment, region split, region merge are
> all
> >> > risky, and all places that HBase can lose data. Meta being hosted on
> the
> >> > master makes communication easier and less flakey. Running ITBLL on a
> >> loop
> >> > that creates a new table every time, and without meta on master
> >> everything
> >> > will fail pretty reliably in ~2 days. With meta on master things pass
> >> MUCH
> >> > more.
> >> >
> >> > # Master hosting the system tables locates the system tables as close
> as
> >> > possible to the machine that will be mutating the data.
> >> >
> >> > Data locality is something that we all work for. Short circuit local
> >> reads,
> >> > Caching blocks in jvm, etc. Bringing data closer to the interested
> party
> >> > has a long history of making things faster and better. Master is in
> >> charge
> >> > of just about all mutations of all systems tables. It's in charge of
> >> > changing meta, changing acls, creating new namespaces, etc. So put the
> >> > memstore as close as possible to the system that's going to mutate
> meta.
> >> >
> >> > # If you want to make meta faster then moving it to other
> regionservers
> >> > makes things worse.
> >> >
> >> > Meta can get pretty hot. Putting it with other regions that clients
> will
> >> be
> >> > trying to access makes everything worse. It means that meta is
> competing
> >> > with user requests. If meta gets served and other requests don't,
> causing
> >> > more requests to meta; or requests to user regions get served and
> other
> >> > clients get starved.
> >> > At FB we've seen read throughput to meta doubled or more by swapping
> it
> >> to
> >> > master. Writes to meta are also much faster since there's no rpc hop,
> no
> >> > queueing, to fighting with reads. So far it has been the single
> biggest
> >> > thing to make meta faster.
> >> >
> >> >
> >> > On Thu, Apr 7, 2016 at 10:11 PM, Stack <stack@duboce.net> wrote:
> >> >
> >> > > I would like to start a discussion on whether Master should be
> carrying
> >> > > regions or not. No hurry. I see this thread going on a while and
> what
> >> > with
> >> > > 2.0 being a ways out yet, there is no need to rush to a decision.
> >> > >
> >> > > First, some background.
> >> > >
> >> > > Currently in the master branch, HMaster hosts 'system tables': e.g.
> >> > > hbase:meta. HMaster is doing more than just gardening the cluster,
> >> > > bootstrapping and keeping all up and serving healthy as in
> branch-1; in
> >> > > master branch, it is actually in the write path for the most
> critical
> >> > > system regions.
> >> > >
> >> > > Master is this way because HMaster and HRegionServer servers have
so
> >> much
> >> > > in common, they should be just one binary, w/ HMaster as any other
> >> server
> >> > > with the HMaster function a minor appendage runnable by any running
> >> > > HRegionServer.
> >> > >
> >> > > I like this idea, but the unification work was just never finished.
> >> What
> >> > is
> >> > > in master branch is a compromise. HMaster is not a RegionServer but
> a
> >> > > sort-of RegionServer doing part serving. So we have HMaster role,
a
> new
> >> > > part-RegionServer-carrying-special-regions role and then a full-on
> >> > > HRegionServer role. We need to fix this messyness. We could revert
> to
> >> > plain
> >> > > branch-1 roles or carrying the
> >> > > HMaster-function-is-something-any-RegionServer-could-execute
> through to
> >> > > completion.
> >> > >
> >> > > More background from a time long-past with good comments by the
> likes
> >> of
> >> > > our Francis Liu and Mighty Matteo Bertozzi are here [1], on unifying
> >> > master
> >> > > and meta-serving. Slightly related are old discussions on being
> able to
> >> > > scale by splitting meta with good comments by our Elliott Clark [2].
> >> > >
> >> > > Also for consideration, the landscape has since changed. [1] was
> >> written
> >> > > before we had ProcedureV2 available to us where we could record
> >> > > intermediate transition states local to the Master rather than
> remote
> >> as
> >> > > intermediate updates to an hbase:meta over rpc running on another
> node.
> >> > >
> >> > > Enough on the background.
> >> > >
> >> > > Let me provoke discussion by making the statement that we should
> undo
> >> > > HMaster carrying any regions ever; that the HMaster function is work
> >> > enough
> >> > > for a single dedicated server and that it important enough that it
> >> cannot
> >> > > take a background role on a serving RegionServer (I could go back
> from
> >> > this
> >> > > position if evidence HMaster role could be backgrounded). Notions
> of a
> >> > > Master carrying system tables only are just not on given system
> tables
> >> > will
> >> > > be too big for a single server especially when hbase:meta is split
> (so
> >> we
> >> > > can scale). This simple distinction of HMaster and RegionServer
> roles
> >> is
> >> > > also what our users know and have gotten used to so needs to be a
> good
> >> > > reason to change it (We can still pursue the single binary that can
> do
> >> > > HMaster or HRegionServer role determined at runtime).
> >> > >
> >> > > Thanks,
> >> > > St.Ack
> >> > >
> >> > > 1.
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#heading=h.j5yqy7n04bkn
> >> > > 2.
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1eCuqf7i2dkWHL0PxcE1HE1nLRQ_tCyXI4JsOB6TAk60/edit#heading=h.80vcerzbkj93
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message