accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4424) Do not wait to start Thrift servers until lock is acquired
Date Fri, 09 Sep 2016 23:13:20 GMT


Josh Elser commented on ACCUMULO-4424:

The general approach here is to start the Thrift Servers for the Master and the HTTP server
for the monitor and then block on obtaining the ZooKeeper lock.

The trick here is that we don't want to accept any RPCs until the lock is acquired. I have
trivially done this with an InvocationHandler around the Thrift IFace or a quick check in
the Monitor servlets.

Turns out that GC already had been doing this. We don't care about protecting its RPC server
since it's just metrics.

One concern I have is that the {{ZooLock.isLockHeld()}} method which is getting invoked is
a synchronized method. This would mean that for every RPC the master gets, we would be grabbing
that lock and then actually processing the RPC. I need to dig a little and see if this is
actually going to be an issue...

> Do not wait to start Thrift servers until lock is acquired
> ----------------------------------------------------------
>                 Key: ACCUMULO-4424
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: rpc
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0
>          Time Spent: 20m
>  Remaining Estimate: 0h
> Had an Accumulo + Ambari user report a funny issue:
> When starting multiple masters, monitors, GC's: they observed that, despite Accumulo
being healthy, Ambari kept reporting that 2/3rd of each service were down. This is because
Ambari is expecting that the Thrift service is up as a service check.
> Presently, for services where only one active instance is allowed, we do not put up the
thrift server until we acquire the leader ZK lock. I propose that we still start these servers
but introduce a barrier to prevent any API calls from succeeding until the leader lock is
obtained. This has a couple of benefits:
> * Better "health" check -- processes might be zombie'd, pidfile check would be insufficient
> * Less confusion around process which is running but not binding the port (have personally
dealt with a case where a user was confused and thought the services where incorrectly stuck
on startup)
> I believe this would also be pretty simple to do since the leader election is already
implemented in one place (just the znode differs).

This message was sent by Atlassian JIRA

View raw message