flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1984) Integrate Flink with Apache Mesos
Date Wed, 17 Aug 2016 15:36:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424729#comment-15424729
] 

ASF GitHub Bot commented on FLINK-1984:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2315#discussion_r75147221
  
    --- Diff: flink-mesos/src/main/scala/org/apache/flink/mesos/scheduler/LaunchCoordinator.scala
---
    @@ -0,0 +1,349 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.mesos.scheduler
    +
    +import akka.actor.{Actor, ActorRef, FSM, Props}
    +import com.netflix.fenzo._
    +import com.netflix.fenzo.functions.Action1
    +import com.netflix.fenzo.plugins.VMLeaseObject
    +import grizzled.slf4j.Logger
    +import org.apache.flink.api.java.tuple.{Tuple2=>FlinkTuple2}
    +import org.apache.flink.configuration.Configuration
    +import org.apache.flink.mesos.scheduler.LaunchCoordinator._
    +import org.apache.flink.mesos.scheduler.messages._
    +import org.apache.mesos.Protos.TaskInfo
    +import org.apache.mesos.{SchedulerDriver, Protos}
    +
    +import scala.collection.JavaConverters._
    +import scala.collection.mutable.{Map => MutableMap}
    +import scala.concurrent.duration._
    +
    +/**
    +  * The launch coordinator handles offer processing, including
    +  * matching offers to tasks and making reservations.
    +  *
    +  * The coordinator uses Netflix Fenzo to optimize task placement.   During the GatheringOffers
phase,
    +  * offers are evaluated by Fenzo for suitability to the planned tasks.   Reservations
are then placed
    +  * against the best offers, leading to revised offers containing reserved resources
with which to launch task(s).
    +  */
    +class LaunchCoordinator(
    +    manager: ActorRef,
    +    config: Configuration,
    +    schedulerDriver: SchedulerDriver,
    +    optimizerBuilder: TaskSchedulerBuilder
    +  ) extends Actor with FSM[TaskState, GatherData] {
    +
    +  val LOG = Logger(getClass)
    +
    +  /**
    +    * The task placement optimizer.
    +    *
    +    * The optimizer contains the following state:
    +    *  - unused offers
    +    *  - existing task placement (for fitness calculation involving task colocation)
    +    */
    +  private[mesos] val optimizer: TaskScheduler = {
    +    optimizerBuilder
    +      .withLeaseRejectAction(new Action1[VirtualMachineLease]() {
    +        def call(lease: VirtualMachineLease) {
    +          LOG.info(s"Declined offer ${lease.getId} from ${lease.hostname()} of ${lease.memoryMB()}
MB, ${lease.cpuCores()} cpus.")
    +          schedulerDriver.declineOffer(lease.getOffer.getId)
    +        }
    +      }).build
    +  }
    +
    +  override def postStop(): Unit = {
    +    optimizer.shutdown()
    +    super.postStop()
    +  }
    +
    +  /**
    +    * Initial state
    +    */
    +  startWith(Suspended, GatherData(tasks = Nil, newLeases = Nil))
    +
    +  /**
    +    * State: Suspended
    +    *
    +    * Wait for (re-)connection to Mesos.   No offers exist in this state, but outstanding
tasks might.
    +    */
    +  when(Suspended) {
    +    case Event(msg: Connected, data: GatherData) =>
    +      if(data.tasks.nonEmpty) goto(GatheringOffers)
    +      else goto(Idle)
    +  }
    +
    +  /**
    +    * State: Idle
    +    *
    +    * Wait for a task request to arrive, then transition into gathering offers.
    +    */
    +  onTransition {
    +    case _ -> Idle => assert(nextStateData.tasks.isEmpty)
    +  }
    +
    +  when(Idle) {
    +    case Event(msg: Disconnected, data: GatherData) =>
    +      goto(Suspended)
    +
    +    case Event(offers: ResourceOffers, data: GatherData) =>
    +      // decline any offers that come in
    +      schedulerDriver.suppressOffers()
    +      for(offer <- offers.offers().asScala) { schedulerDriver.declineOffer(offer.getId)
}
    +      stay()
    +
    +    case Event(msg: Launch, data: GatherData) =>
    +      goto(GatheringOffers) using data.copy(tasks = data.tasks ++ msg.tasks.asScala)
    +  }
    +
    +  /**
    +    * Transition logic to control the flow of offers.
    +    */
    +  onTransition {
    +    case _ -> GatheringOffers =>
    +      LOG.info(s"Now gathering offers for at least ${nextStateData.tasks.length} task(s).")
    +      schedulerDriver.reviveOffers()
    +
    +    case GatheringOffers -> _ =>
    +      // decline any outstanding offers and suppress future offers
    +      LOG.info(s"No longer gathering offers; all requests fulfilled.")
    +
    +      assert(nextStateData.newLeases.isEmpty)
    +      schedulerDriver.suppressOffers()
    +      optimizer.expireAllLeases()
    +  }
    +
    +  /**
    +    * State: GatheringOffers
    +    *
    +    * Wait for offers to accumulate for a fixed length of time or from specific slaves.
    --- End diff --
    
    Why do we wait for offers being accumulated? Why not starting tasks as soon as we get
offers? 


> Integrate Flink with Apache Mesos
> ---------------------------------
>
>                 Key: FLINK-1984
>                 URL: https://issues.apache.org/jira/browse/FLINK-1984
>             Project: Flink
>          Issue Type: New Feature
>          Components: Cluster Management
>            Reporter: Robert Metzger
>            Assignee: Eron Wright 
>            Priority: Minor
>         Attachments: 251.patch
>
>
> There are some users asking for an integration of Flink into Mesos.
> -There also is a pending pull request for adding Mesos support for Flink-: https://github.com/apache/flink/pull/251
> Update (May '16):  a new effort is now underway, building on the recent ResourceManager
work.
> Design document:  ([google doc|https://docs.google.com/document/d/1WItafBmGbjlaBbP8Of5PAFOH9GUJQxf5S4hjEuPchuU/edit?usp=sharing])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message