hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Li <car...@gmail.com>
Subject Re: [VOTE] Merge branch HBASE-19397 back to master and branch-2.
Date Tue, 09 Jan 2018 02:58:31 GMT
+1 for both master and branch-2. IMHO flaky UT designs are some must-fix
once found, efforts will be paid sooner or later in stability testing (or
in production after release if our svt coverage is not good enough).

Best Regards,
Yu

On 9 January 2018 at 10:32, OpenInx <openinx@gmail.com> wrote:

> +1  for a merge to master firstly ( no-binding )
>
> If merge into the master branch as early as possible, we can develop
> follow-on features (table based features & sync-replication) as early as
> possible.
>
> On Tue, Jan 9, 2018 at 9:28 AM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > FWIW, you have my +1 for a merge to master.
> >
> >
> >
> > On Mon, Jan 8, 2018 at 5:22 PM, 张铎(Duo Zhang) <palomino219@gmail.com>
> > wrote:
> >
> > > Anyway, if no objections on merging this into master, let's do it? So
> > that
> > > we can start working on the follow-on features, such as table based
> > > replication storage, and synchronous replication, etc.
> > >
> > > Thanks.
> > >
> > > 2018-01-09 7:19 GMT+08:00 Stack <stack@duboce.net>:
> > >
> > > > On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > > > wrote:
> > > >
> > > > > This 'new' feature only changes DDL part, not the core part of
> > > > replication,
> > > > > i.e, how to read wal entries and how to replicate it to the remote
> > > > cluster,
> > > > > etc. And also there is no pb message/storage layout change, you can
> > > think
> > > > > of this as a big refactoring. Theoretically we even do not need to
> > add
> > > > new
> > > > > UTs for this feature, i.e, no extra stability works. The only
> visible
> > > > > change to users is that it may require more time on modifying peers
> > in
> > > > > shell. So in general I think it is less hurt to include it in the
> > > coming
> > > > > release?
> > > > >
> > > > > And why I think it SHOULD be included in our 2.0 release is that,
> the
> > > > > synchronous guarantee is really a good thing for our replication
> > > related
> > > > > UTs. The correctness of the current Test***Replication usually
> > depends
> > > > on a
> > > > > flakey condition - we will not do a log rolling between the
> > > modification
> > > > on
> > > > > zk and the actual loading of the modification on RS. And we have
> also
> > > > done
> > > > > a hard work to cleanup the lockings in ReplicationSourceManager and
> > > add a
> > > > > fat comment to say why it should be synchronized in this way. In
> > > general,
> > > > > the new code is much easier to read, test and debug, and also
> reduce
> > > the
> > > > > possibility of flakeyness, which could help us a lot when we start
> to
> > > > > stabilize our build.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > You see it as a big bug fix Duo?
> > > >
> > > Kind of. Just like the AM v1, we can do lots of fix to make it more
> > stable,
> > > but we know that we can not fix all the problems since we store state
> in
> > > several places and it is a 'mission impossible' to make all the states
> > stay
> > > in sync under every situation... So we introduce AM v2.
> > > For the replication peer tracking, it is the same problem. It is hard
> to
> > do
> > > fencing with zk watcher since it is asynchronous, so the UTs are always
> > > kind of flakey in theoretical. And we depend on replication heavily at
> > > Xiaomi, it is always a pain for us.
> > >
> > > >
> > > > I'm late to review. Will have a look after beta-1 goes out. This
> stuff
> > > > looks great from outside, especially distributed procedure framework
> > > which
> > > > we need all over the place.
> > > >
> > > > In general I have no problem w/ this in master and an hbase 2.1 (2.1
> > > could
> > > > be soon after 2.0). Its late for big stuff in 2.0.... but let me
> take a
> > > > looksee sir.
> > > >
> > > Thanks sir. All the concerns here about whether we should merge this
> into
> > > 2.0 are reasonable, I know. Although I really want this in 2.0 because
> I
> > > believe it will help a lot for stabilizing,  I'm OK with merge it to
> 2.1
> > > only if you guys all think so.
> > >
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <appy@cloudera.com>:
> > > > >
> > > > > > Same questions as Josh's.
> > > > > > 1) We have RCs for beta1 now, which means only commits that
can
> go
> > in
> > > > are
> > > > > > bug fixes only. This change - although important, needed from
> long
> > > time
> > > > > and
> > > > > > well done (testing, summary, etc) - seems rather very large
to
> get
> > > into
> > > > > 2.0
> > > > > > now. Needs good justification why it has to be 2.1 instead of
> 2.0.
> > > > > >
> > > > > > -- Appy
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <elserj@apache.org>
> > > wrote:
> > > > > >
> > > > > > > -0 From a general project planning point-of-view (not based
on
> > the
> > > > > > > technical merit of the code) I am uncomfortable about pulling
> in
> > a
> > > > > brand
> > > > > > > new feature after we've already made one beta RC.
> > > > > > >
> > > > > > > Duo -- can you expand on why this feature is so important
that
> we
> > > > > should
> > > > > > > break our release plan? Are there problems that would make
> > > including
> > > > > this
> > > > > > > in a 2.1/3.0 unnecessarily difficult? Any kind of color
you can
> > > > provide
> > > > > > on
> > > > > > > "why does this need to go into 2.0?" would be helpful.
> > > > > > >
> > > > > > >
> > > > > > > On 1/6/18 1:54 AM, Duo Zhang wrote:
> > > > > > >
> > > > > > >> https://issues.apache.org/jira/browse/HBASE-19397
> > > > > > >>
> > > > > > >> We aim to move the peer modification framework from
zk watcher
> > to
> > > > > > >> procedure
> > > > > > >> v2 in this issue and the work is done now.
> > > > > > >>
> > > > > > >> Copy the release note here:
> > > > > > >>
> > > > > > >> Introduce 5 procedures to do peer modifications:
> > > > > > >>
> > > > > > >>> AddPeerProcedure
> > > > > > >>> RemovePeerProcedure
> > > > > > >>> UpdatePeerConfigProcedure
> > > > > > >>> EnablePeerProcedure
> > > > > > >>> DisablePeerProcedure
> > > > > > >>>
> > > > > > >>> The procedures are all executed with the following
stage:
> > > > > > >>> 1. Call pre CP hook, if an exception is thrown
then give up
> > > > > > >>> 2. Check whether the operation is valid, if not
then give up
> > > > > > >>> 3. Update peer storage. Notice that if we have
entered this
> > > stage,
> > > > > then
> > > > > > >>> we
> > > > > > >>> can not rollback any more.
> > > > > > >>> 4. Schedule sub procedures to refresh the peer
config on
> every
> > > RS.
> > > > > > >>> 5. Do post cleanup if any.
> > > > > > >>> 6. Call post CP hook. The exception thrown will
be ignored
> > since
> > > we
> > > > > > have
> > > > > > >>> already done the work.
> > > > > > >>>
> > > > > > >>> The procedure will hold an exclusive lock on the
peer id, so
> > now
> > > > > there
> > > > > > is
> > > > > > >>> no concurrent modifications on a single peer.
> > > > > > >>>
> > > > > > >>> And now it is guaranteed that once the procedure
is done, the
> > > peer
> > > > > > >>> modification has already taken effect on all RSes.
> > > > > > >>>
> > > > > > >>> Abstracte a storage layer for replication peer/queue
> > manangement,
> > > > and
> > > > > > >>> refactored the upper layer to remove zk related
> > > > naming/code/comment.
> > > > > > >>>
> > > > > > >>> Add pre/postExecuteProcedures CP hooks to
> RegionServerObserver,
> > > and
> > > > > add
> > > > > > >>> permission check for executeProcedures method which
requires
> > the
> > > > > caller
> > > > > > >>> to
> > > > > > >>> be system user or super user.
> > > > > > >>>
> > > > > > >>> On rolling upgrade: just do not do any replication
peer
> > > > modifications
> > > > > > >>> during the rolling upgrading. There is no pb/layout
changes
> on
> > > the
> > > > > > >>> peer/queue storage on zk.
> > > > > > >>>
> > > > > > >>>
> > > > > > >> And there are other benefits.
> > > > > > >> First, we have introduced a general procedure framework
to
> send
> > > > tasks
> > > > > to
> > > > > > >> RS
> > > > > > >> and report the report back to Master. It can be used
to
> > implement
> > > > > other
> > > > > > >> operations such as ACL change.
> > > > > > >> Second, zk is used as a external storage now since
we do not
> > > depend
> > > > on
> > > > > > zk
> > > > > > >> watcher any more, it will be much easier to implement
a 'table
> > > > based'
> > > > > > >> replication peer/queue storage.
> > > > > > >>
> > > > > > >> Please vote:
> > > > > > >> [+1] Agree
> > > > > > >> [-1] Disagree
> > > > > > >> [0] Neutral
> > > > > > >>
> > > > > > >> Thanks.
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > -- Appy
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>
>
>
> --
> ==============================
> Openinx  blog : http://openinx.github.io
>
> TO BE A GREAT HACKER !
> ==============================
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message