aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Murthy <ashwinmur...@gmail.com>
Subject Re: Review Request 45511: AURORA-1493: create ELB-friendly endpoint to detect leading scheduler
Date Fri, 01 Apr 2016 03:19:09 GMT


> On March 31, 2016, 6:33 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/http/LeaderHealth.java, line 49
> > <https://reviews.apache.org/r/45511/diff/4/?file=1320481#file1320481line49>
> >
> >     Can we switch on this enum, rather than if/else branching? If we also add an
explicit case for each enum value and reserve the default block for an unexpected status it
will allow this code to continue to work as expected even if new values are added to the enum
in the future (i.e. as things stand today, if we added a hypothetical state like `LeaderStatus.ALSO_LEADING`,
it would fall into the `SERVICE_UNAVAILABLE` branch).
> 
> Ashwin Murthy wrote:
>     @Bill, @Joshua, 
>     
>     Can I do the following?
>     
>     1. Return 200 is leader
>     2. Return 501 (Not implemented) if this is not the leader but a leader exists
>     3. Return 503 (Service unavailable) if there is no current leader
>     4. Return 500 (Internal server error) for the default case that Joshua points out
>     
>     Would you be ok with this? I
> 
> Joshua Cohen wrote:
>     In the [LeaderRedirectFilter](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/http/LeaderRedirectFilter.java#L70-L91)
we do the following:
>     
>     Is Leading -> continue request (i.e. 200)
>     No Leader -> 503
>     Not Leading -> 307
>     Unknown State -> 500
>     
>     So that meshes pretty well with what you describe, you're just replacing the redirect
when the instance is not the leader with a non 2xx. I'm not sure if I love 501 for that case,
but I'm having a hard time finding a better alternative.
>     
>     That being said, I feel like I've asked this before, but you can clarify why the
existing `LeaderRedirectFilter` is not sufficient for ELB? If I'm reading the [docs](http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-healthchecks.html#health-check-configuration)
right, it seems like any non-200 reply is sufficient to indicate the instance is not healthy
(i.e. not the leader). So given that `LeaderRedirectFilter` already returns a 307 for this
case, can we not just use that?
> 
> Ashwin Murthy wrote:
>     @Bill, @Joshua, 
>     
>     Actually, I changed my mind. The prev one is rather more complex for this simple
scenario than needs to be.
>     
>     How about?:
>     Return 200 if instance is leader
>     Return 503 (Service unavailablE) for the case when this instance is not the leader
but a leader exists
>     Return 500 (internal server error) for all other (default) cases since this is truly
an error. 
>     
>     Would you be ok with this?

The reason the 307 does not work in all cases is many LBs like HAProxy default to using response
code 307 to mean healthy. Some LBs might even case the redirect. Since strictly speakin per
HTTP RFC, this seems wrong to do, having a separate endpoint that returns a 500 level code
is being pursued.


- Ashwin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45511/#review126394
-----------------------------------------------------------


On March 31, 2016, 5:06 a.m., Ashwin Murthy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45511/
> -----------------------------------------------------------
> 
> (Updated March 31, 2016, 5:06 a.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> AURORA-1493: create ELB-friendly endpoint to detect leading scheduler. The fix is to
add a new endpoint - "/leaderhealth" which returns http status code 200 (OK) if the instance
is the leader. If the instance is not the leader but a leading exists, returns 500 (Internal
server error). If there is no leader at all, returns 503 (Service unavailable)
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/http/LeaderHealth.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/http/LeaderHealthTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/45511/diff/
> 
> 
> Testing
> -------
> 
> Added new unit test
> 
> 
> Thanks,
> 
> Ashwin Murthy
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message