accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <vi...@apache.org>
Subject Re: Interesting bug report
Date Tue, 26 Jan 2016 17:33:02 GMT
That sounds like great follow on work (clients register ephemerally so the
master can tell clients to disconnect, etc.), but I think just having a
client that can get a better read on the state of the system is a
phenomenal starting point.

On Tue, Jan 26, 2016 at 11:52 AM Keith Turner <keith@deenlo.com> wrote:

> On Mon, Jan 25, 2016 at 10:59 AM, John Vines <vines@apache.org> wrote:
>
> > Of course, it's when I hit send that I realize that we could mitigate by
> > making the client aware of the master state, and if the system is shut
> down
> >
>
> Thats a good idea.  Should consider the use case when someone wants to shut
> Accumulo down and bring it back up immediately.  We could allow an admin to
> decide what they want clients to do when they shutdown Accumulo (clients
> die, wait, anything else?).  This could be accomplished with supplemental
> information in ZK or other goal states.
>
>
> > (which was the case for that ticket), then it can fail quickly with a
> > descriptive message.
> >
> > On Mon, Jan 25, 2016 at 10:58 AM John Vines <vines@apache.org> wrote:
> >
> > > While we want to be fault tolerant, there's a point where we want to
> > > eventually fail. I know we have a couple never ending retry loops that
> > need
> > > to be addressed (https://issues.apache.org/jira/browse/ACCUMULO-1268),
> > > but I'm unsure if queries suffer from this problem.
> > >
> > > Unfortunately, fault tolerance is a bit at odds with instant
> notification
> > > of system issues, since some of the fault tolerance is temporally
> > oriented.
> > > And that ticket lacks context of it never failing out vs. failing out
> > > eventually (but too long for the user)
> > >
> > >
> > > On Sun, Jan 24, 2016 at 7:46 PM Christopher <ctubbsii@apache.org>
> wrote:
> > >
> > >> I saw this bug report:
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1300987
> > >>
> > >> As far as I can tell, they are reporting normal, expected, and desired
> > >> behavior of Accumulo as a bug. But, is there something we can do
> > upstream
> > >> to enable fast failures in the case of Accumulo not running to support
> > >> their use case?
> > >>
> > >> Personally, I don't see how we can reliably detect within the client
> > that
> > >> the cluster is down or up, vs. a normal temporary server
> > outage/migration,
> > >> since there is there is no single point of authority for Accumulo to
> > >> determine its overall operating status if ZooKeeper is running and no
> > >> other
> > >> servers are. Am I wrong?
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message