curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <mad...@cloudera.com>
Subject Re: CURATOR-79
Date Fri, 08 Aug 2014 22:27:59 GMT
+1


On Fri, Aug 8, 2014 at 4:58 PM, Cameron McKenzie <mckenzie.cam@gmail.com>
wrote:

> The proposed alternative is pretty much the same thing, but moving the
> logic to the LockInternals class. The added complexity of this is that the
> LockInternals class doesn't know what the name of the zNode that was going
> to be created is called. We'd need to expose the generated UUID on the
> CreateBuilder.
>
> My preference is to fix this in the protected() code. We can either just
> handle the InterruptedException, or handle any non KeeperException (other
> than ConnectionLoss). The InterruptedException is what's happening in this
> case, but if we get an NPE or something in that code, the same thing would
> occur, so I think that perhaps we should just have a blanket catch all.
>
> I have coded this up already and all of the unit tests pass ok.
> cheers
>
>
> On Fri, Aug 8, 2014 at 11:05 PM, Mike Drob <madrob@cloudera.com> wrote:
>
> > Explicitly coding for a possible InterruptedException sounds good to me,
> > but we already know that I prefer (more) diverse Exception types.
> >
> > I'm not sure I understand what the proposed alternative is?
> >
> > Mike
> >
> >
> > On Fri, Aug 8, 2014 at 1:02 AM, Cameron McKenzie <mckenzie.cam@gmail.com
> >
> > wrote:
> >
> > > Guys,
> > > I've been looking into a fix for CURATOR-79 (
> > > https://issues.apache.org/jira/browse/CURATOR-79) and have found it to
> > be
> > > slightly more complicated than initially expected.
> > >
> > > The locking recipes are using protected zNodes (i.e the zNode name
> > contains
> > > a random UUID that is tied to a particular builder instance) for locks,
> > > which is sensible, but there seems to be an issue with this.
> > >
> > > The protected logic basically looks for the cause of failure on a
> create,
> > > and if it's connection loss, then it does an ensured deleted on the
> path
> > it
> > > was trying to create to ensure that it's removed if it did get created.
> > >
> > > For CURATOR-79, and InterruptedException is causing this call to fail
> > when
> > > waiting for the response from ZK. This means that the protected logic
> > does
> > > not fire and we end up with an orphaned node.
> > >
> > > It's possible with some ugliness to handle this in the
> InterprocesMutex,
> > > but I think that maybe it's better fixed in the protected logic. Maybe
> > the
> > > protected logic could be modified so that it will occur on
> ConnectionLoss
> > > or on any non-KeeperException (i.e. InterruptedException). This would
> > cause
> > > the zNode to be removed if it was created, and would fix this deadlock
> > > issue.
> > >
> > > I would welcome anyone's opinion on the way forward.
> > > cheers
> > > Cam
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message