aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Adams <j...@foursquare.com>
Subject Re: Task Constraints
Date Wed, 16 Jul 2014 19:21:24 GMT
+Leo Kim who is looking at the compiler error with us.


On Wed, Jul 16, 2014 at 8:25 AM, Kevin Burg <kburg@foursquare.com> wrote:

> The idea with the fix is to read the slave's attributes right off the
> offer rather than going into 'AttributeStore' and keying on the slave's
> name. The slave's resources are read off the offer in this way, so I don't
> see why it can't be done with attributes as well.
>
> Someone who understands all the places where SchedulingFilter.filter is
> used might be able to fix this better than I can.
>
>
> On Wed, Jul 16, 2014 at 6:40 AM, Josh Adams <josh@foursquare.com> wrote:
>
>> Hi there,
>>
>> Given that we would need to disrupt running jobs to add constraints in
>> the future we are blocking on
>> https://issues.apache.org/jira/browse/AURORA-582 before we can push any
>> of our services on to Aurora in production.
>>
>> Kevin Burg attempted to resolve the related bug
>> https://issues.apache.org/jira/browse/AURORA-328 by making some changes
>> here:
>> https://github.com/foursquare/incubator-aurora/commit/b1962fad3fe9ef76954fa107abed25d78b809331
>> but we seem to be getting a type mismatch when compiling the code.
>>
>> Any help and/or info on the bugfix progress would be much appreciated.
>> Aside from AURORA-582 we are ready to roll (pun intended!)
>>
>> Best,
>> Josh
>>
>>
>> On Mon, Jul 14, 2014 at 11:42 AM, Josh Adams <josh@foursquare.com> wrote:
>>
>>> Ah, makes sense. We'll try that. Thanks for clarifying this Kevin.
>>>
>>> Josh
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:30 AM, Kevin Sweeney <kevints@apache.org>
>>> wrote:
>>>
>>>> Slaves persist their attributes (including attributes) across restarts
>>>> due to slave recovery (that's what allows you to upgrade mesos in-place
>>>> without killing the tasks they're managing). Unfortunately to change
>>>> attributes you need to remove persisted slave metadata (the "meta"
>>>> directory). This will kill all of a slave's underlying tasks but the newly
>>>> registered slave should have the correct attributes.
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 11:26 AM, Kevin Burg <kburg@foursquare.com>
>>>> wrote:
>>>>
>>>>> I've confirmed by looking at that endpoint that new attributes are not
>>>>> being picked up and modified attributes are retaining their old values.
>>>>> This is after restarting both the slaves and the scheduler process.
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2014 at 11:09 AM, Josh Adams <josh@foursquare.com>
>>>>> wrote:
>>>>>
>>>>> > Thanks Brian. Kevin should have some followup questions shortly.
>>>>> >
>>>>> > Josh
>>>>> >
>>>>> >
>>>>> > On Mon, Jul 14, 2014 at 10:37 AM, Brian Wickman <wickman@apache.org>
>>>>> > wrote:
>>>>> >
>>>>> >> host/rack should not be treated specially.
>>>>> >>
>>>>> >> If you go to the "/slaves" endpoint on the scheduler UI, what
does
>>>>> it
>>>>> >> report as attributes being exported by your slaves?  You might
want
>>>>> to
>>>>> >> validate there that the "staging" attribute got picked up properly.
>>>>>  If
>>>>> >> it's not getting picked up (e.g. the attributes are getting
cached
>>>>> >> incorrectly by the scheduler?) then you should file an issue.
>>>>> >>
>>>>> >>
>>>>> >> On Fri, Jul 11, 2014 at 5:24 PM, Kevin Burg <kburg@foursquare.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I'm having trouble getting the task constraint resolver
worker with
>>>>> >>> attributes other than 'host' and 'rack.' Are arbitrary attribute
>>>>> keys in
>>>>> >>> the mesos slaves supported currently?
>>>>> >>>
>>>>> >>> Here is the setup.
>>>>> >>>
>>>>> >>> The slaves are configured to run with
>>>>> >>> `--attributes=host:<host>;rack:<rack>;staging:true`
>>>>> >>>
>>>>> >>> (I've also tried this with staging:1, and staging:foo)
>>>>> >>>
>>>>> >>> The constraint generated from the .aurora config looks like
the
>>>>> following
>>>>> >>> Constraint(name:staging, constraint:<TaskConstraint
>>>>> >>> value:ValueConstraint(negated:false, values:[true])>)
>>>>> >>>
>>>>> >>> The schedule request then gets vetoed with the following
veto
>>>>> object:
>>>>> >>> Veto{reason=Constraint not satisfied: staging, score=1000,
>>>>> >>> valueMismatch=true}]
>>>>> >>>
>>>>> >>> The constraints generated for 'host' and 'rack' look identical
>>>>> except for
>>>>> >>> the different name of course. I've even tried bouncing every
mesos
>>>>> and
>>>>> >>> aurora process on the machine to see if maybe stale attributes
>>>>> were being
>>>>> >>> assigned to the slaves. All the offers being made to the
master
>>>>> look
>>>>> >>> correct though, which leads me to believe that the constraint
>>>>> solver just
>>>>> >>> doesn't work for arbitrary attributes.
>>>>> >>>
>>>>> >>> We would appreciate any help you can offer.
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Kevin
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ===============
>>>>> > josh adams
>>>>> > production engineer
>>>>> > foursquare
>>>>> >
>>>>> > (gv) 415-830-4106
>>>>> > ===============
>>>>> > foursquare.com/jobs
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ===============
>>> josh adams
>>> production engineer
>>> foursquare
>>>
>>> (gv) 415-830-4106
>>> ===============
>>> foursquare.com/jobs
>>>
>>
>>
>>
>> --
>> ===============
>> josh adams
>> production engineer
>> foursquare
>>
>> (gv) 415-830-4106
>> ===============
>> foursquare.com/jobs
>>
>
>


-- 
===============
josh adams
production engineer
foursquare

(gv) 415-830-4106
===============
foursquare.com/jobs

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message