mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joris Van Remoortere (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-5545) Add rack awareness support for Mesos resources
Date Tue, 07 Jun 2016 15:58:21 GMT

    [ https://issues.apache.org/jira/browse/MESOS-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318745#comment-15318745
] 

Joris Van Remoortere edited comment on MESOS-5545 at 6/7/16 3:58 PM:
---------------------------------------------------------------------

Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual conversation.
It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
- Get a sense of timeline.
- Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component change like
this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough interest to follow
through as of yet. I would focus the most on getting this prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some implementation
/ configuration bias (LLDP). I would work on partitioning general fault domain awareness (Mesos)
from assigning of the attributes (Operator / automation).
- Take a step back and consider what other information we may want to associate with fault
domains in the future. Is there a structure that is more resilient to augmentation in the
future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take based upon it.
Have we thought out all the actions, and whether they would require changes to Mesos?
- You should clarify whether these attributes are expected to change over the life-time of
an agent. For example, currently we don't allow resources or IPs to change. If this were also
true for fault domain attributes, it would simplify the implementation. If you feel that dynamic
attributes are necessary, then I would urge you to make that a phase 2 project and first work
with the community to agree on a common pattern for updating any attributes on the agent,
and how to surface consequential changes to both tasks and frameworks. (You may see why I
suggest static to begin with ;-) )




was (Author: jvanremoortere):
Hi [~fan.du].
Thanks for raising this topic and working on a design doc.
This topic has been discussed a few times before, although mostly during casual conversation.
It's great that you've captured and documented some ideas.

I would suggest that the next steps involve:
1. Raising this at the community sync to:
  A. Get a sense of timeline.
  B. Find a shepherd.
2. Iterate on the design with the shepherd and a working group.
3. Validate the design with a large user base. This is critical for a component change like
this.
4. Then we can get to the patches.

The immediate feedback I can give is:
- Although a very fun and interesting project, we haven't gotten enough interest to follow
through as of yet. I would focus the most on getting this prioritized on the roadmap.
- Mesos is about primitives. Your design doc mixes primitives (great) with some implementation
/ configuration bias (LLDP). I would work on partitioning general fault domain awareness (Mesos)
from assigning of the attributes (Operator / automation).
- Take a step back and consider what other information we may want to associate with fault
domains in the future. Is there a structure that is more resilient to augmentation in the
future than an {{optional rack_id}}?
- How should schedulers use this information, and what actions may they take based upon it.
Have we thought out all the actions, and whether they would require changes to Mesos?
- You should clarify whether these attributes are expected to change over the life-time of
an agent. For example, currently we don't allow resources or IPs to change. If this were also
true for fault domain attributes, it would simplify the implementation. If you feel that dynamic
attributes are necessary, then I would urge you to make that a phase 2 project and first work
with the community to agree on a common pattern for updating any attributes on the agent,
and how to surface consequential changes to both tasks and frameworks. (You may see why I
suggest static to begin with ;-) )



> Add rack awareness support for Mesos resources
> ----------------------------------------------
>
>                 Key: MESOS-5545
>                 URL: https://issues.apache.org/jira/browse/MESOS-5545
>             Project: Mesos
>          Issue Type: Story
>          Components: hadoop, master
>            Reporter: Fan Du
>         Attachments: RackAwarenessforMesos-Lite.pdf
>
>
> Resources managed by Mesos master have no topology information of the cluster, for example,
rack topology. While lots of data center applications have rack awareness feature to provide
data locality, fault tolerance and intelligent task placement. This ticket tries to investigate
how to add rack awareness for Mesos resources topology.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message