aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Isaac Councill (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-761) Provide a proxy for generic service discovery
Date Tue, 14 Oct 2014 23:02:34 GMT

    [ https://issues.apache.org/jira/browse/AURORA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171678#comment-14171678
] 

Isaac Councill commented on AURORA-761:
---------------------------------------

I'll start with creating a pool of records since that's easy. Clients by default get a list
of all healthy backends for an app from consul. Clients can even filter on arbitrary tags
provided by jobs, which could come in handy.

As for moved backends: largely they can be handled exactly as follows from current Announcer,
with the difference being that reliance on ZK session timeouts to remove ephemeral znodes
must be replaced with consul checks. Consul agents will be collocated with tasks on the mesos
slaves so it's possible to do any kind of check, theoretically even making sure a PID exists
in the local proc table (needs thought). TTL checks are supported, which would be analogous
to the ZK session keepalive - in that scheme, clients would post state to Consul and be marked
unhealthy if a post does not come within a specified timeout.

However, health checks are only part of the story. It's great that unhealthy jobs won't be
served up in client requests, but we don't want thousands of failed backends issuing vain
healthchecks after a few weeks of task movement. The main thing is when to actually deregister
a service. A great thing about serversets is that deregistration happens automatically on
session timeouts. I don't see any way to replicate that behavior (yet) with consul but I'm
still learning.

Options I see:

the elephant) Integrate consul in the scheduler. My strong inclination is to avoid that.

the lol) Trick the consul backend node into deregistering itself by having it execute a health
check script with a callback to the consul API on fatal conditions (e.g., missing PID, again
more thought needed). I would make sure I'm not just ignorant of supported auto-deregistration
features in Consul before going there.

the final process) Do consul de-registration in a final process. Not sure how robust that
would be in the face of update/killall, but haven't played around with it much. Cleanest option
so far.

You're absolutely welcome to talk me out of this route. Auto-configuring HAProxy directly
from ZK would be so clean and easy, as would be writing non-DNS clients to get records. DNS
is cool, though, and consul provides some pretty nice features. It also seems like it would
be a network traffic win, but I've got to kick consul around more to find out for sure that's
the case.


> Provide a proxy for generic service discovery
> ---------------------------------------------
>
>                 Key: AURORA-761
>                 URL: https://issues.apache.org/jira/browse/AURORA-761
>             Project: Aurora
>          Issue Type: Story
>          Components: Service Discovery, Usability
>            Reporter: Bill Farner
>            Priority: Minor
>
> While {{Announcer}} provides service registration, we lack a cross-cutting answer for
service discovery.  There are well-known libraries that will do it (e.g. finagle), but we
need an answer for others.  Marathon, for example, provides a script called {{haproxy_marathon_bridge}}
that reloads configuration of HAProxy for this purpose.  We could do something similar with
a mixin {{Process}} that dynamically routes an inbound port to a serverset path in ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message