aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko" <ma...@apache.org>
Subject Re: Review Request 30710: add mesos role feature
Date Tue, 10 Feb 2015 03:57:01 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30710/#review71754
-----------------------------------------------------------



src/main/java/org/apache/aurora/scheduler/configuration/Resources.java
<https://reviews.apache.org/r/30710/#comment117619>

    I am afraid the current approach is not going to work in a cluster with mesos slaves configured
to offer multi-role resources. Consider an example where a slave is configured to offer both
role-specific and general pool CPU:
    ```
    --resources=cpus(aurora):4;cpus(*):2
    ```
    
    A correspondent offer resources section could be:
    ```
    ...
      "resources" :
      [
        {
          "name"        : "cpus",
          "type"        : SCALAR,
          "scalar"      : { "value" : 4.0 },
          "role"        : "aurora",
        },
        {
          "name"        : "cpus",
          "type"        : SCALAR,
          "scalar"      : { "value" : 2.0 },
          "role"        : "*",
        }
      ]
    ...
    ```
    Given the above, a matching task with 6.0 CPU would require a TaskInfo with the same resources
section (4.0-aurora, 2.0-*) or mesos would reject the task launch.
    
    The current state of Aurora where a framework role is not configurable gives us the "luxury"
of not supporting multi-role allocations. However, if we are going to support role-based allocations
I think we should do it the proper way and fully support single and multi-role assignments.
This is going to be especially critical when the mesos dynamic reservation [1] lands. 
    
    [1] - https://docs.google.com/document/d/1e3j69pfBgtc8xM00DhcuiMl6ImkEB5na0TzOMyzrg8A/edit#


- Maxim Khutornenko


On Feb. 10, 2015, 12:49 a.m., lozhang@ebay.com zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30710/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2015, 12:49 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Bill Farner.
> 
> 
> Bugs: AURORA-1109
>     https://issues.apache.org/jira/browse/AURORA-1109
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> ## Problems
> 
> We are from eBay platform team. Previously, we used marathon to generate Jenkins master
instance in dedicated vms and recieve resource offer from same dedicated vms. For the details,
please refer to
> http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/#.VNQUuC6_SPU
> 
> Now, we found Aurora is more stable and powerful. We are moving from Marathon to Aurora.
During the move, we found there is no mesos role in Aurora now. But we need use mesos role
way to solve the problem in section "Frameworks stopped receiving offers after a while" of
the given url.
> 
> Here is a snippet of the problem description:
> 
> *We noticed occurred after we used Marathon to create the initial set of CI masters.
As those CI masters started registering themselves as frameworks, Marathon stopped receiving
any offers from Mesos; essentially, no new CI masters could be launched. Let’s start with
Marathon. In the DRF model, it was unfair to treat Marathon in the same bucket/role alongside
hundreds of connected Jenkins frameworks. After launching all these Jenkins frameworks, Marathon
had a large resource share and Mesos would aggressively offer resources to frameworks that
were using little or no resources. Marathon was placed last in priority and got starved out.*
> 
> *We decided to define a dedicated Mesos role for Marathon and to have all of the Mesos
slaves that were reserved for Jenkins master instances support that Mesos role. Jenkins frameworks
were left with the default role “*”.*This solved the problem – Mesos offered resources
per role and hence Marathon never got starved out. A framework with a special role will get
resource offers from both slaves supporting that special role and also from the default role
“*”.* However, since we were using placement constraints, Marathon accepted resource offers
only from slaves that supported both the role and the placement constraints.*
> ## Solution
> 
> So we add role feature is the source code to solve the problem in same way: When accept
a resource offer, Aurora will send back the needed resources to Mesos with the mesos role
in resource offer.
> 
> How to configure the Mesos role:
> 1.Add cmd option --mesos_role=${Mesos role name} when start Aurora scheduler.
> 
> We change the test cases according code change. Each changed test case is green
> Merge https://github.com/zhanglong2015/incubator-aurora
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/ResourceSlot.java 1a158b4e0be94762ad0480e8ce74b19bacc90c97

>   src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 31aa2bbaab3d97875493ad75c4d2c7c82ac7fa58

>   src/main/java/org/apache/aurora/scheduler/configuration/Resources.java b5a3140e3560f790d1db496dca3c2ee0dc96a195

>   src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
d0994203b5650f44ca2eb32e1e2aa61875163854 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 5340d651b298ec8aa079e73d6d2f652fdf876293

>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java e1c29747c9854cf75bf63f6f085cf40ca68989af

>   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 422d5a9a42310979752eb7282658316c2b772419

>   src/test/java/org/apache/aurora/scheduler/configuration/ResourcesTest.java d6febb8998e05257cabe8d193cefa0b6c79f197e

>   src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 5f08d00d39f016af9bc296e517ad49b66ab5a8de

>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 411a55a8d85f60bb2703468f2d69b64b2736eee4

> 
> Diff: https://reviews.apache.org/r/30710/diff/
> 
> 
> Testing
> -------
> 
> :buildSrc:compileJava UP-TO-DATE
> :buildSrc:compileGroovy UP-TO-DATE
> :buildSrc:processResources UP-TO-DATE
> :buildSrc:classes UP-TO-DATE
> :buildSrc:jar UP-TO-DATE
> :buildSrc:assemble UP-TO-DATE
> :buildSrc:compileTestJava UP-TO-DATE
> :buildSrc:compileTestGroovy UP-TO-DATE
> :buildSrc:processTestResources UP-TO-DATE
> :buildSrc:testClasses UP-TO-DATE
> :buildSrc:test UP-TO-DATE
> :buildSrc:check UP-TO-DATE
> :buildSrc:build UP-TO-DATE
> :api:generateThriftJava
> :api:classesThrift
> Note: Some input files use unchecked or unsafe operations.
> Note: Recompile with -Xlint:unchecked for details.
> :api:checkPython
> :api:generateThriftEntitiesJava
> :api:classesThriftEntities
> :api:compileJava UP-TO-DATE
> :api:generateThriftResources
> :api:processResources UP-TO-DATE
> :api:classes
> :api:jar
> :compileJava
> Note: Writing file:/root/incubator-aurora/dist/classes/main/com/twitter/common/args/apt/cmdline.arg.info.txt.2
> :processResources
> :classes
> :jar
> :assemble
> :compileJmhJava
> warning: Supported source version 'RELEASE_6' from annotation processor 'org.openjdk.jmh.generators.BenchmarkProcessor'
less than -source '1.7'
> 1 warning
> :processJmhResources UP-TO-DATE
> :jmhClasses
> :checkstyleJmh SKIPPED
> :jsHint UP-TO-DATE
> :checkstyleMain SKIPPED
> :compileTestJava
> :processTestResources
> :testClasses
> :checkstyleTest SKIPPED
> :findbugsJmh SKIPPED
> :findbugsMain SKIPPED
> :findbugsTest SKIPPED
> :licenseJmh SKIPPED
> :licenseMain SKIPPED
> :licenseTest SKIPPED
> :license UP-TO-DATE
> :pmdMain SKIPPED
> :test
> :jacocoTestReport
> Coverage report generated: file:///root/incubator-aurora/dist/reports/jacoco/test/html/index.html
> :analyzeReport
> Instruction coverage of 0.891119619012046 exceeds minimum coverage of 0.89.
> Branch coverage is 0.834349593495935, but must be greater than 0.835
> :check
> :build
> :api:assemble
> :api:compileTestJava UP-TO-DATE
> :api:processTestResources UP-TO-DATE
> :api:testClasses UP-TO-DATE
> :api:test UP-TO-DATE
> :api:check UP-TO-DATE
> :api:build
> :buildSrc:compileJava UP-TO-DATE
> :buildSrc:processResources UP-TO-DATE
> :buildSrc:classes UP-TO-DATE
> :buildSrc:jar
> :buildSrc:assemble
> :buildSrc:compileTestJava UP-TO-DATE
> :buildSrc:processTestResources UP-TO-DATE
> :buildSrc:testClasses UP-TO-DATE
> :buildSrc:test UP-TO-DATE
> :buildSrc:check UP-TO-DATE
> :buildSrc:build
> 
> BUILD SUCCESSFUL
> 
> Total time: 2 mins 26.403 secs
> 
> 
> Thanks,
> 
> lozhang@ebay.com zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message