hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)
Date Wed, 19 Oct 2016 01:58:05 GMT
Allan:
One factor to consider is that the assignment manager in hbase 2.0 would be
quite different from those in 0.98 and 1.x branches.

Meaning, you may need to come up with two solutions for a single problem.

FYI

On Tue, Oct 18, 2016 at 6:11 PM, Allan Yang <allanwin@163.com> wrote:

> Hi, Ted
> These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535,
> HBASE-14729) are ALL reproduced in our HBase1.x test environment. Fixing
> them is exactly what I'm going to do. I haven't found the root cause yet,
> but  I will update if I find solutions.
>  what I afraid is that, there are other issues I don't know yet. So if you
> or other guys know other issues related to DLR, please let me know
>
>
> Regards
> Allan Yang
>
>
>
>
>
>
>
> At 2016-10-19 00:19:06, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >Allan:
> >I wonder how you deal with open issues such as HBASE-13535.
> >From your description, it seems your team fixed more DLR issues.
> >
> >Cheers
> >
> >On Mon, Oct 17, 2016 at 11:37 PM, allanwin <allanwin@163.com> wrote:
> >
> >>
> >>
> >>
> >> Here is the thing. We have backported DLR(HBASE-7006) to our 0.94
> >> clusters  in production environment(of course a lot of bugs are fixed
> and
> >> it is working well). It is was proven to be a huge gain. When a large
> >> cluster crash down, the MTTR improved from several hours to less than a
> >> hour. Now, we want to move on to HBase1.x, and still we want DLR. This
> >> time, we don't want to backport the 'backported' DLR to HBase1.x, but it
> >> seems like that the community have determined to remove DLR...
> >>
> >>
> >> The DLR feature is proven useful in our production environment, so I
> think
> >> I will try to fix its issues in branch-1.x
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2016-10-18 13:47:17, "Anoop John" <anoop.hbase@gmail.com> wrote:
> >> >Agree with ur observation.. But DLR feature we wanted to get removed..
> >> >Because it is known to have issues..  Or else we need major work to
> >> >correct all these issues.
> >> >
> >> >-Anoop-
> >> >
> >> >On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> >> If you have a cluster, I suggest you turn on DLR and observe the
> effect
> >> >> where fewer than half the region servers are up after the crash.
> >> >> You would have first hand experience that way.
> >> >>
> >> >> On Mon, Oct 17, 2016 at 6:33 PM, allanwin <allanwin@163.com>
wrote:
> >> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Yes, region replica is a good way to improve MTTR. Specially if
one
> or
> >> two
> >> >>> servers are down, region replica can improve data availability.
But
> >> for big
> >> >>> disaster like 1/3 or 1/2 region servers shutdown, I think DLR still
> >> useful
> >> >>> to bring regions online more quickly and with less IO usage.
> >> >>>
> >> >>>
> >> >>> Regards
> >> >>> Allan Yang
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> At 2016-10-17 21:01:16, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >> >>> >Here was the thread discussing DLR:
> >> >>> >
> >> >>> >http://search-hadoop.com/m/YGbbOxBK2n4ES12&subj=Re+
> >> >>> DISCUSS+retiring+current+DLR+code
> >> >>> >
> >> >>> >> On Oct 17, 2016, at 4:15 AM, allanwin <allanwin@163.com>
wrote:
> >> >>> >>
> >> >>> >> Hi, All
> >> >>> >>  DLR can improve MTTR dramatically, but since it have
many bugs
> like
> >> >>> HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't
> >> know?),
> >> >>> it was proved unreliable, and has been deprecated almost in all
> >> branches
> >> >>> now.
> >> >>> >>
> >> >>> >>
> >> >>> >> My question is, is there any other way other than DLR
to improve
> >> MTTR?
> >> >>> 'Cause If a big cluster crashes, It takes a long time to bring
> regions
> >> >>> online, not to mention it will create huge pressure on the IOs.
> >> >>> >>
> >> >>> >>
> >> >>> >> To tell the truth, I still want DLR back, if the community
don't
> >> have
> >> >>> any plan to bring back DLR, I may want to figure out the problems
in
> >> DLR
> >> >>> and make it working and reliable, Any suggests for that?
> >> >>> >>
> >> >>> >>
> >> >>> >> sincerely
> >> >>> >> Allan Yang
> >> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message