aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rogier Dikkes (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-1811) sla_list_safe_domain no longer reports SLA usage
Date Tue, 08 Nov 2016 11:58:58 GMT
Rogier Dikkes created AURORA-1811:
-------------------------------------

             Summary: sla_list_safe_domain no longer reports SLA usage
                 Key: AURORA-1811
                 URL: https://issues.apache.org/jira/browse/AURORA-1811
             Project: Aurora
          Issue Type: Bug
          Components: Client, Maintenance, SLA
    Affects Versions: 0.16.0
         Environment: Vagrant image - Ubuntu, Centos 7.2
            Reporter: Rogier Dikkes
            Priority: Minor
             Fix For: 0.14.0


We recently had to patch hosts, in our situation we have a couple of services that run less
than 2-5 instances with production = true and tier = preferred as provided in the default
example documentation. 

As we understood host_drain is not configurable to set the minimum job instance count, the
default is is 10. We tried to compile a list of hosts with aurora_admin sla_list_safe_domain
that are running these services to feed host_drain with an unsafe_hosts_file. 

When we ran the aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 95
1m the scheduler returns: 
 INFO] Response from scheduler: OK (message: )

As if there are no hosts. We tried to change the percentage and duration to see if anything
was returned but we never receive an different response.

To ensure that the client is not the cause we used the 0.16.0 client against an 0.14.0 cluster,
this cluster reports hosts that are safe to kill without violating job sla's. 

To ensure its not a faulty cluster setup on our part we started the vagrant sandbox, started
an task with 3 instances with tier = preferred and production = True.

commands used:
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 20 50m
aurora_admin sla_list_safe_domain --min_job_instance_count=2 devcluster 90 5m

With -l or with time and percentage variations never changes the outcome.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message