mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Xu <xuj...@apple.com>
Subject Re: Offer operation reconciliation discussion notes
Date Wed, 23 Aug 2017 20:58:06 GMT
Yeah a reason for failed operations is probably useful for all resource
operations. It looks like the task-style status update is still the best
approach.

---
@xujyan <https://twitter.com/xujyan>

On Wed, Aug 23, 2017 at 11:40 AM, Jie Yu <yujie.jay@gmail.com> wrote:

> We should continue the discussion here:
>
> I think I forgot to mention one important reason that I went for the
> operation based reconciliation API proposal. For new operations like
> CREATE_VOLUME/CREATE_BLOCK, not only we need to know the end result (the
> resources) if it's successful, we also need to know the failure reason if
> it fails. For instance, imagine you're creating an EBS volume by talking to
> a CSI EBS plugin. Surfacing the creation error (e.g., retryable or not from
> the CSI plugin) will be useful for scheduler to determine the next step.
>
> I don't think a resources based reconciliation API can address this. Maybe
> we can add both if we feel both are useful?
>
> Thoughts?
> - Jie
>
> On Wed, Aug 23, 2017 at 11:26 AM, Jie Yu <yujie.jay@gmail.com> wrote:
>
>> Hi,
>>
>> We had a discussion on some very early proposal (see the attached slides)
>> on providing feedback for offer operations (e.g., CREATE/DESTORY,
>> RESERVE/UNRESERVE, etc.) with a bunch of folks from the community. Here are
>> the notes I captured in the meeting:
>>
>>
>>    - One alternative approach discussed was to have best effort
>>    feedback, and a resources based reconciliation API allowing framework to
>>    query the resources on a given resource provider or agent. That way, we
>>    don't necessarily need the status update mechanism for offer operations,
>>    which causes complexity in the frameworks.
>>    - In the current proposal, do we need agent_id (or resource provider
>>    id) when performing reconciliation for that operation? The reason we
>>    require that in the task reconciliation case is because agent might not
>>    re-register yet during master failover.
>>    - We need to mock up the operator API for this work.
>>    - What's the order guarantee for the operations specified in one API
>>    call?
>>    - Wish list
>>       - Reservation tie to framework instead of role.
>>       - When a framework teardown, auto release resources reserved for
>>       that framework
>>
>> If I miss anything, please reply to this thread! Thanks!
>>
>> https://docs.google.com/presentation/d/1Mef8K3aLIuzcFVc3MnAo
>> 64TkjpyTWarYVShtvCN4e48/edit?usp=sharing
>>
>> - Jie
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message