hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBASE-4060 - TimeOutMonitor refactoring
Date Thu, 04 Aug 2011 02:19:04 GMT
Bring the following discussion to public.
HBASE-4015 is in the critical path of 0.92

Cheers

On Wed, Aug 3, 2011 at 8:12 AM, Ramkrishna S Vasudevan <
ramakrishnas@huawei.com> wrote:

> Hi JD
>
> I was working on finalising a strategy to avoid Timeoutmonitor race
> condition.  I have few queries when i tried reproducing the issue and while
> going through the code.
> The scenario that is mentioned in the defect where the region is left in
> PENDING_OPEN state when RS1 who was first not opening the region, moved the
> state from OFFLINE to OPENING when the RS2 started opening the same region.
>
>
> When i tried to reproduce and went thro the code if the RS that tries to
> make the state changes from OFFLINE->OPENING->OPENED we always check for
> the
> version of the znode before proceeding with the state updation.
> So for the above mentioned scenario I get a log saying
> "Region already hijacked? "
>
> Pls correct me if am wrong? Could you brief me more on the problem that
> causes this race condition.
>
> We are working on a strategy so that every RS is made aware whether it
> should take up the assignment or not by implementing some STATEs which is
> visible to both master and RS.
>
> Once am clear with the real root cause i will upload our idea of overcoming
> the race condition.
>
> Thanks & Regards
> Ram
>
>
>
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> Sent: Tuesday, August 02, 2011 3:52 AM
> To: ramakrishnas@huawei.com; ram_krish_86@hotmail.com
> Cc: stack; Ted Yu
> Subject: Re: HBASE-4060 - TimeOutMonitor refactoring
>
> I've not started working on this yet, happy to review your ideas/code Ram.
>
> Thanks,
>
> J-D
>
> On Fri, Jul 29, 2011 at 7:54 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > Copying J-D.
> >
> > On Fri, Jul 29, 2011 at 7:38 AM, Ramkrishna S Vasudevan
> > <ramakrishnas@huawei.com> wrote:
> >>
> >> Hi Ted/Stack,
> >>
> >>
> >>
> >> We analyzed and found similar issues are occurring even in our cluster a
> >> couple of times.
> >>
> >>
> >>
> >> So we are very much interested in taking it up though we have not yet
> >> analyzed/started the ground work on it.  I would also like to know if
> any
> >> one is currently working on it.  Particularly JD was very much keen on
> this
> >> issue.
> >>
> >>
> >>
> >> Even if you guys have a plan or solution for that I would like to take
> >> part in it or even ready to implement few things as part of it.
> >>
> >>
> >>
> >> I would like to know your comments and suggestions on this.
> >>
> >>
> >>
> >> Regards
> >>
> >> Ram
> >>
> >>
> >>
> >>
> >>
> >> P.S: Plz do reply to the id in CC also as i will be in travel over the
> >> weekend.
> >>
> >>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message