drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Omernik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4286) Have an ability to put server in quiescent mode of operation
Date Fri, 10 Jun 2016 12:52:21 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324398#comment-15324398
] 

John Omernik commented on DRILL-4286:
-------------------------------------

Paul and I had an offline discussion on this as well, so I will repeat some things I mentioned
to to Paul before. 

I like the idea of of a state, in my post to Paul, I added a znode and created a "desired"
state value. The reason for this I'll explain a bit below, however, I will say I am not a
zookeeper expert so having a znode that drillbits watched was one of those things that sounded
good on the surface, but I worried about performance of say a 1000 node cluster.  To support
my idea, and before I explain it, I would state we could have a "desired_state.poll.interval.seconds"
configuration variable added which would be the interval a drill bit would poll it's znode
to determine the desired state.  This interval would start out with random(int(0-desired_state_poll.interval.seconds))
(That's not any language, just a way to represent that the first poll would be a random number
of seconds between 0 and the poll interval so there would be some staggering of the requests).
  

Ok znode1: state

As Paul said, "not set", "START", "RUN", "DRAIN", "STOP".  My initial suggestion did not have
a "Not Set" I.e. when the drill bit registered initially, it always registered with "start"
and only changed to "run" when everything was healthy. Also, I didn't have "STOP", instead
I had "DRAINING" (in addition to DRAINED) I think Paul's DRAIN maybe my "DRAINING" and Paul's
"STOP" may be my "DRAINED" If that is so, then I think we should discuss this. A Drill Bit
that is drained is not "Stopped" It's still running, and I want to be clear it's state.  What
I am doing is have an idea that a Bit can be running, but not accepting queries, and not in
a "shutting down" mode.  This may assist in future use cases with Troubleshooting, or other
administration tasks.  Also, the state of "DRAINING" is different from that of "DRAINED" in
how the administrator looks at things. 

znode2: desired_state

Like I said in my first paragraph, I am a bit worried my lack of understanding of Zookeeper
may preclude this, however, I think there are some advantages here.  As I wrote to Paul, It's
nice that we have the SIGTERM methodology built in, but that's a course tool. First, it assumes
that the "desired" state is only shutdown and to do so while draining queries, it's also a
"bit only" feature, as Paul said, it doesn't stop other nodes from trying to include that
node in the query. So what does that do from a failure perspective? I.e. If a different node
that is foreman, plans a query including that node right now, does the shutting down node
know more work is coming, or could there be a race condition where the shutting down node,
believes it to be done, so it exits, and then other foreman sends work to a dead node... i.e.
failed query.   More so, I don't like SIGTERM because we as an initiator, because we need
to let the cluster know of that drillbit's state as well. Edge Case: we have a node in a bad
state, we send sigterm to it, and it ignores it, for whatever reason, will other foremans
still assign work?  Could we get into wonky cluster states because of that?  In addition,
when looking at Paul's idea with a REST option for remote shutdown, we have to assume that
the node is in a good state, it has to be thing accepting the control to start the draining
command. Thus, if you sent a Rest command to drain, and that node was in a halfway state or
a state where it didn't follow through on the request, other bits may still work to that node.
Especially if for whatever reason that hinky drillbit couldn't update it's state. 

So, my solution is to use a "desired_state" (please also see the "heartbeat" note below).


We aim to deprecate the SIGTERM methodology. This is a cluster of computers, sending remote
SIGTERMS is not something I think makes sense at scale. Instead, we have in the WebUI and
the RestAPI, as Paul stated, the State, and then my additional Desired state.  Any Admin user
can update the Desired state of any node in the cluster.  This is done through a simple API
call, (and check of permissions).  Nodes start with a Default desired state of "RUN" (Although
as I mentioned to Paul, I think we could add an option such as "drillbit.default.desired_state"
which by default is set to RUN. This way an administrator, if they have reasons, could start
drillbits say in a "drained" state. I.e. if during "START" the desired_state is "DRAINED"
the bit would move to this state rather than "RUN".  

A healthy drill bit will poll it's desired state with the poll interval above,  and will always
try to achieve it's desired state.  Thus if the state is RUN, and it sees the desired_state
change to "DRAINED" on the next poll, it will change it's state to DRAINING until queries
are done, and other nodes, when scheduling queries, could read the current state of all bits,
and if NOT "RUN" than don't include in the planning.  This helps the potential  race condition
that is currently in the SIGTERM  method. 

So, with the "heartbeat" I have mentioned above, I've seen some posts mentioning a heartbeat
mechanism, however, I am in the dark on how it can work.  A new foreman, when submitting a
queries the two znodes (state and desired state) if either of them is not "RUN", then it wouldn't
include the bit in the query.  If however, both are RUN, and the foreman goes to schedule,
and something errors out on that nodes work, or if some heartbeat check fails on work submission,
the foreman could set the "state" to "Error/Unknown"  This would help other queries quickly
ignore this bit for future queries. Now, the conditions that could put a node state into "Error/Unknown"
would have be well monitored, to ensure we don't have nodes dropping for the wrong reasons,
but this could help the overall stability of the cluster in that new work would not be sent
to this bit of unknown state.   In addition, once a node is in this state, only it can change
that state.  The state should only be changed by the node itself, unless that state change
is based on an error/unknown condition.  

Overall I think this approach would provide stability and flexibility when you have weird
hardware issues, memory issues, etc across a cluster, it would allow admins to easily manually
select nodes for draining, or moving out of operation for testing, log gathering, stack traces
etc.  In addition, the changing of state is cluster wide operation, both in how the node learns
about it's desired state change AND how the other nodes learn about cluster state changes.


This approach would also not require any changes to YARN to work.  SIGTERM could be supported
for healthy nodes, but the logic just changed to start the draining process via znode update,
and then when the state changes from draining to drained, the SIGTERM method would exit the
process. Basically, replicating what is happening now, while using the framework (and keeping
other nodes from sending jobs to the draining node). 

I would be very interested in discussion on this, this is a challenge for other SQL on Hadoop
tools, and really is need feature for a high availability cluster that still has the ability
to be administrated, patched, etc. 



> Have an ability to put server in quiescent mode of operation
> ------------------------------------------------------------
>
>                 Key: DRILL-4286
>                 URL: https://issues.apache.org/jira/browse/DRILL-4286
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Flow
>            Reporter: Victoria Markman
>
> I think drill will benefit from mode of operation that is called "quiescent" in some
databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to restrict access
to the database server without interrupting current processing. After you perform this task,
the database server sets a flag that prevents new sessions from gaining access to the database
server. The current sessions are allowed to finish processing. After you initiate the mode
change, it cannot be canceled. During the mode change from online to quiescent, the database
server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message