hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From allanwin <allan...@163.com>
Subject Re:Re: Re: What way to improve MTTR other than DLR(distributed log replay)
Date Tue, 18 Oct 2016 06:37:44 GMT



Here is the thing. We have backported DLR(HBASE-7006) to our 0.94 clusters  in production
environment(of course a lot of bugs are fixed and it is working well). It is was proven to
be a huge gain. When a large cluster crash down, the MTTR improved from several hours to less
than a hour. Now, we want to move on to HBase1.x, and still we want DLR. This time, we don't
want to backport the 'backported' DLR to HBase1.x, but it seems like that the community have
determined to remove DLR... 


The DLR feature is proven useful in our production environment, so I think I will try to fix
its issues in branch-1.x






At 2016-10-18 13:47:17, "Anoop John" <anoop.hbase@gmail.com> wrote:
>Agree with ur observation.. But DLR feature we wanted to get removed..
>Because it is known to have issues..  Or else we need major work to
>correct all these issues.
>
>-Anoop-
>
>On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>> If you have a cluster, I suggest you turn on DLR and observe the effect
>> where fewer than half the region servers are up after the crash.
>> You would have first hand experience that way.
>>
>> On Mon, Oct 17, 2016 at 6:33 PM, allanwin <allanwin@163.com> wrote:
>>
>>>
>>>
>>>
>>> Yes, region replica is a good way to improve MTTR. Specially if one or two
>>> servers are down, region replica can improve data availability. But for big
>>> disaster like 1/3 or 1/2 region servers shutdown, I think DLR still useful
>>> to bring regions online more quickly and with less IO usage.
>>>
>>>
>>> Regards
>>> Allan Yang
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2016-10-17 21:01:16, "Ted Yu" <yuzhihong@gmail.com> wrote:
>>> >Here was the thread discussing DLR:
>>> >
>>> >http://search-hadoop.com/m/YGbbOxBK2n4ES12&subj=Re+
>>> DISCUSS+retiring+current+DLR+code
>>> >
>>> >> On Oct 17, 2016, at 4:15 AM, allanwin <allanwin@163.com> wrote:
>>> >>
>>> >> Hi, All
>>> >>  DLR can improve MTTR dramatically, but since it have many bugs like
>>> HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't know?),
>>> it was proved unreliable, and has been deprecated almost in all branches
>>> now.
>>> >>
>>> >>
>>> >> My question is, is there any other way other than DLR to improve MTTR?
>>> 'Cause If a big cluster crashes, It takes a long time to bring regions
>>> online, not to mention it will create huge pressure on the IOs.
>>> >>
>>> >>
>>> >> To tell the truth, I still want DLR back, if the community don't have
>>> any plan to bring back DLR, I may want to figure out the problems in DLR
>>> and make it working and reliable, Any suggests for that?
>>> >>
>>> >>
>>> >> sincerely
>>> >> Allan Yang
>>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message