mesos-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject svn commit: r1359445 - in /incubator/mesos/trunk: ./ frameworks/mpi/ mpi/
Date Mon, 09 Jul 2012 23:58:40 GMT
Author: benh
Date: Mon Jul  9 23:58:39 2012
New Revision: 1359445

URL: http://svn.apache.org/viewvc?rev=1359445&view=rev
Log:
Updated MPI framework and included it in the distributions (contributed by Harvey Feng, https://reviews.apache.org/r/4768).

Added:
    incubator/mesos/trunk/mpi/
    incubator/mesos/trunk/mpi/README
    incubator/mesos/trunk/mpi/mpiexec-mesos.in
    incubator/mesos/trunk/mpi/mpiexec-mesos.py   (with props)
Removed:
    incubator/mesos/trunk/frameworks/mpi/README.txt
    incubator/mesos/trunk/frameworks/mpi/nmpiexec
    incubator/mesos/trunk/frameworks/mpi/nmpiexec.py
    incubator/mesos/trunk/frameworks/mpi/startmpd.py
    incubator/mesos/trunk/frameworks/mpi/startmpd.sh
Modified:
    incubator/mesos/trunk/Makefile.am
    incubator/mesos/trunk/configure.ac

Modified: incubator/mesos/trunk/Makefile.am
URL: http://svn.apache.org/viewvc/incubator/mesos/trunk/Makefile.am?rev=1359445&r1=1359444&r2=1359445&view=diff
==============================================================================
--- incubator/mesos/trunk/Makefile.am (original)
+++ incubator/mesos/trunk/Makefile.am Mon Jul  9 23:58:39 2012
@@ -39,6 +39,10 @@ EXTRA_DIST += configure.amazon-linux-64 
   configure.macosx configure.ubuntu-lucid-64 configure.ubuntu-natty-64
 
 
+# MPI framework.
+EXTRA_DIST += mpi/README mpi/mpiexec-mesos.in mpi/mpiexec-mesos.py
+
+
 if HAS_JAVA
 maven-install:
 	@cd src && $(MAKE) $(AM_MAKEFLAGS) maven-install

Modified: incubator/mesos/trunk/configure.ac
URL: http://svn.apache.org/viewvc/incubator/mesos/trunk/configure.ac?rev=1359445&r1=1359444&r2=1359445&view=diff
==============================================================================
--- incubator/mesos/trunk/configure.ac (original)
+++ incubator/mesos/trunk/configure.ac Mon Jul  9 23:58:39 2012
@@ -100,6 +100,8 @@ AC_CONFIG_FILES([include/mesos/mesos.hpp
 
 AC_CONFIG_FILES([src/java/generated/org/apache/mesos/MesosNativeLibrary.java])
 
+AC_CONFIG_FILES([mpi/mpiexec-mesos], [chmod +x mpi/mpiexec-mesos])
+
 
 AC_ARG_ENABLE([java],
               AS_HELP_STRING([--disable-java],

Added: incubator/mesos/trunk/mpi/README
URL: http://svn.apache.org/viewvc/incubator/mesos/trunk/mpi/README?rev=1359445&view=auto
==============================================================================
--- incubator/mesos/trunk/mpi/README (added)
+++ incubator/mesos/trunk/mpi/README Mon Jul  9 23:58:39 2012
@@ -0,0 +1,59 @@
+Mesos MPICH2 framework readme
+--------------------------------------------
+
+Table of Contents:
+1) Installing MPICH2
+2) Running the Mesos MPICH2 framework
+
+=====================
+1) INSTALLING MPICH2:
+=====================
+- This framework was developed for MPICH2 1.2(mpd was deprecated
+  starting 1.3) on Linux(Ubuntu 11.10) and OS X Lion.
+
+- You can install MPICH2 from scratch. You can get MPICH2 as well as
+  installation directions here:
+  http://www.mcs.anl.gov/research/projects/mpich2/. This tutorial
+  follows the latter. Unpack the tar.gz and...
+
+- To use MPI with Mesos, make sure to have MPICH2 installed on every
+  machine in your cluster.
+
+Setting up:
+-> Install and configure:
+mac    :  ./configure --prefix=/Users/_your_username_/mpich2-install
+ubuntu :  ./configure --prefix=/home/_your_username_/mpich2-install
+	  Then...
+	  sudo make
+	  sudo make install
+
+
+-> Optional: add mpich binaries to PATH. You can specify the path to
+   installed MPICH2 binaries using mpiexec-meso's '--path' option
+mac    :  sudo vim ~/.bash_profile
+          export PATH=/Users/_your_username_/mpich2-install/bin:$PATH
+ubuntu :  sudo vim ~/.bashrc
+          export PATH=/home/_your_username_/mpich2-install/bin:$PATH
+
+-> Create .mpd conf file in home directory:
+          echo "secretword=nil" > ~/.mpd.conf
+	  chmod 600 .mpd.conf
+
+-> Check installation - these should all return the PATH's set above
+          which mpd
+          which mpiexec
+          which mpirun
+
+
+=====================================
+2) RUNNING THE MESOS MPICH2 FRAMEWORK
+=====================================
+
+Using/testing mpiexec-mesos:
+-> Start a Mesos master and slaves
+
+-> How to run a Hello, World! program (pass the -h flag to see help options):
+       mpicc helloworld.c -helloworld
+      ./mpiexec-mesos 127.0.0.1:5050 ./helloworld
+   Paths to mesos, protobuf, and distribute eggs can be specified by setting
+   respective environment variables in mpiexec-mesos.

Added: incubator/mesos/trunk/mpi/mpiexec-mesos.in
URL: http://svn.apache.org/viewvc/incubator/mesos/trunk/mpi/mpiexec-mesos.in?rev=1359445&view=auto
==============================================================================
--- incubator/mesos/trunk/mpi/mpiexec-mesos.in (added)
+++ incubator/mesos/trunk/mpi/mpiexec-mesos.in Mon Jul  9 23:58:39 2012
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+# This script uses MESOS_SOURCE_DIR and MESOS_BUILD_DIR which come
+# from configuration substitutions.
+MESOS_SOURCE_DIR=@abs_top_srcdir@
+MESOS_BUILD_DIR=@abs_top_builddir@
+
+# Use colors for errors.
+. ${MESOS_SOURCE_DIR}/support/colors.sh
+
+# Force the use of the Python interpreter configured during building.
+test ! -z "${PYTHON}" && \
+  echo "${RED}Ignoring PYTHON environment variable (using @PYTHON@)${NORMAL}"
+
+PYTHON=@PYTHON@
+
+DISTRIBUTE_EGG=`echo ${MESOS_BUILD_DIR}/third_party/distribute-*/dist/*.egg`
+
+test ! -e ${DISTRIBUTE_EGG} && \
+  echo "${RED}Failed to find ${DISTRIBUTE_EGG}${NORMAL}" && \
+  exit 1
+
+PROTOBUF=${MESOS_BUILD_DIR}/third_party/protobuf-2.4.1
+PROTOBUF_EGG=`echo ${PROTOBUF}/python/dist/protobuf*.egg`
+
+test ! -e ${PROTOBUF_EGG} && \
+  echo "${RED}Failed to find ${PROTOBUF_EGG}${NORMAL}" && \
+  exit 1
+
+MESOS_EGG=`echo ${MESOS_BUILD_DIR}/src/python/dist/mesos*.egg`
+
+test ! -e ${MESOS_EGG} && \
+  echo "${RED}Failed to find ${MESOS_EGG}${NORMAL}" && \
+  exit 1
+
+SCRIPT=`dirname ${0}`/mpiexec-mesos.py
+
+test ! -e ${SCRIPT} && \
+  echo "${RED}Failed to find ${SCRIPT}${NORMAL}" && \
+  exit 1
+
+PYTHONPATH="${DISTRIBUTE_EGG}:${MESOS_EGG}:${PROTOBUF_EGG}" \
+  exec ${PYTHON} ${SCRIPT} "${@}"

Added: incubator/mesos/trunk/mpi/mpiexec-mesos.py
URL: http://svn.apache.org/viewvc/incubator/mesos/trunk/mpi/mpiexec-mesos.py?rev=1359445&view=auto
==============================================================================
--- incubator/mesos/trunk/mpi/mpiexec-mesos.py (added)
+++ incubator/mesos/trunk/mpi/mpiexec-mesos.py Mon Jul  9 23:58:39 2012
@@ -0,0 +1,216 @@
+#!/usr/bin/env python
+
+import mesos
+import mesos_pb2
+import os
+import sys
+import time
+import re
+import threading
+
+from optparse import OptionParser
+from subprocess import *
+
+
+def mpiexec():
+  print "We've launched all our MPDs; waiting for them to come up"
+
+  while countMPDs() <= TOTAL_MPDS:
+    print "...waiting on MPD(s)..."
+    time.sleep(1)
+  print "Got %d mpd(s), running mpiexec" % TOTAL_MPDS
+
+  try:
+    print "Running mpiexec"
+    call([MPICH2PATH + 'mpiexec', '-1', '-n', str(TOTAL_MPDS)] + MPI_PROGRAM)
+
+  except OSError,e:
+    print >> sys.stderr, "Error executing mpiexec"
+    print >> sys.stderr, e
+    exit(2)
+
+  print "mpiexec completed, calling mpdallexit %s" % MPD_PID
+
+  # Ring/slave mpd daemons will be killed on executor's shutdown() if
+  # framework scheduler fails to call 'mpdallexit'.
+  call([MPICH2PATH + 'mpdallexit', MPD_PID])
+
+
+class MPIScheduler(mesos.Scheduler):
+
+  def __init__(self, options, ip, port):
+    self.mpdsLaunched = 0
+    self.mpdsFinished = 0
+    self.ip = ip
+    self.port = port
+    self.options = options
+    self.startedExec = False
+
+  def registered(self, driver, fid, masterInfo):
+    print "Mesos MPI scheduler and mpd running at %s:%s" % (self.ip, self.port)
+    print "Registered with framework ID %s" % fid.value
+
+  def resourceOffers(self, driver, offers):
+    print "Got %d resource offers" % len(offers)
+
+    for offer in offers:
+      print "Considering resource offer %s from %s" % (offer.id.value, offer.hostname)
+
+      if self.mpdsLaunched == TOTAL_MPDS:
+        print "Declining permanently because we have already launched enough tasks"
+        driver.declineOffer(offer.id)
+        continue
+
+      cpus = 0
+      mem = 0
+      tasks = []
+
+      for resource in offer.resources:
+        if resource.name == "cpus":
+          cpus = resource.scalar.value
+        elif resource.name == "mem":
+          mem = resource.scalar.value
+
+      if cpus < CPUS or mem < MEM:
+        print "Declining offer due to too few resources"
+        driver.declineOffer(offer.id)
+      else:
+        tid = self.mpdsLaunched
+        self.mpdsLaunched += 1
+
+        print "Accepting offer on %s to start mpd %d" % (offer.hostname, tid)
+
+        task = mesos_pb2.TaskInfo()
+        task.task_id.value = str(tid)
+        task.slave_id.value = offer.slave_id.value
+        task.name = "task %d " % tid
+
+        cpus = task.resources.add()
+        cpus.name = "cpus"
+        cpus.type = mesos_pb2.Value.SCALAR
+        cpus.scalar.value = CPUS
+
+        mem = task.resources.add()
+        mem.name = "mem"
+        mem.type = mesos_pb2.Value.SCALAR
+        mem.scalar.value = MEM
+
+        task.command.value = "%smpd --noconsole --ncpus=%d --host=%s --port=%s" % (MPICH2PATH,
CPUS, self.ip, self.port)
+
+        tasks.append(task)
+
+        print "Replying to offer: launching mpd %d on host %s" % (tid, offer.hostname)
+        driver.launchTasks(offer.id, tasks)
+
+    if not self.startedExec and self.mpdsLaunched == TOTAL_MPDS:
+      threading.Thread(target = mpiexec).start()
+      self.startedExec = True
+
+  def statusUpdate(self, driver, update):
+    print "Task %s in state %s" % (update.task_id.value, update.state)
+    if (update.state == mesos_pb2.TASK_FAILED or
+        update.state == mesos_pb2.TASK_KILLED or
+        update.state == mesos_pb2.TASK_LOST):
+      print "A task finished unexpectedly, calling mpdexit on %s" % MPD_PID
+      call([MPICH2PATH + "mpdexit", MPD_PID])
+      driver.stop()
+    if (update.state == mesos_pb2.TASK_FINISHED):
+      self.mpdsFinished += 1
+      if self.mpdsFinished == TOTAL_MPDS:
+        print "All tasks done, all mpd's closed, exiting"
+        driver.stop()
+
+
+def countMPDs():
+  try:
+    mpdtraceproc = Popen(MPICH2PATH + "mpdtrace -l", shell=True, stdout=PIPE)
+    mpdtraceline = mpdtraceproc.communicate()[0]
+    return mpdtraceline.count("\n")
+  except OSError,e:
+    print >>sys.stderr, "Error starting mpd or mpdtrace"
+    print >>sys.stderr, e
+    exit(2)
+
+
+def parseIpPort(s):
+  ba = re.search("_([^ ]*) \(([^)]*)\)", s)
+  ip = ba.group(2)
+  port = ba.group(1)
+  return (ip, port)
+
+
+if __name__ == "__main__":
+  parser = OptionParser(usage="Usage: %prog [options] mesos_master mpi_program")
+  parser.disable_interspersed_args()
+  parser.add_option("-n", "--num",
+                    help="number of mpd's to allocate (default 1)",
+                    dest="num", type="int", default=1)
+  parser.add_option("-c", "--cpus",
+                    help="number of cpus per mpd (default 1)",
+                    dest="cpus", type="int", default=1)
+  parser.add_option("-m","--mem",
+                    help="number of MB of memory per mpd (default 1GB)",
+                    dest="mem", type="int", default=1024)
+  parser.add_option("--name",
+                    help="framework name", dest="name", type="string")
+  parser.add_option("-p","--path",
+                    help="path to look for MPICH2 binaries (mpd, mpiexec, etc.)",
+                    dest="path", type="string", default="")
+  parser.add_option("--ifhn-master",
+                    help="alt. interface hostname for what mpd is running on (for scheduler)",
+                    dest="ifhn_master", type="string")
+
+  # Add options to configure cpus and mem.
+  (options,args) = parser.parse_args()
+  if len(args) < 2:
+    print >> sys.stderr, "At least two parameters required."
+    print >> sys.stderr, "Use --help to show usage."
+    exit(2)
+
+  TOTAL_MPDS = options.num
+  CPUS = options.cpus
+  MEM = options.mem
+  MPI_PROGRAM = args[1:]
+
+  # Give options.path a trailing '/', if it doesn't have one already.
+  MPICH2PATH = os.path.join(options.path, "")
+
+  print "Connecting to Mesos master %s" % args[0]
+
+  try:
+    mpd_cmd = MPICH2PATH + "mpd"
+    mpdtrace_cmd = MPICH2PATH + "mpdtrace -l"
+
+    if options.ifhn_master is not None:
+      call([mpd_cmd, "--daemon", "--ifhn=" + options.ifhn_master])
+    else:
+      call([mpd_cmd, "--daemon"])
+
+    mpdtraceproc = Popen(mpdtrace_cmd, shell=True, stdout=PIPE)
+    mpdtraceout = mpdtraceproc.communicate()[0]
+
+  except OSError,e:
+    print >> sys.stderr, "Error starting mpd or mpdtrace"
+    print >> sys.stderr, e
+    exit(2)
+
+  (ip,port) = parseIpPort(mpdtraceout)
+
+  MPD_PID = mpdtraceout.split(" ")[0]
+  print "MPD_PID is %s" % MPD_PID
+
+  scheduler = MPIScheduler(options, ip, port)
+
+  framework = mesos_pb2.FrameworkInfo()
+  framework.user = ""
+
+  if options.name is not None:
+    framework.name = options.name
+  else:
+    framework.name = "MPI: %s" % MPI_PROGRAM[0]
+
+  driver = mesos.MesosSchedulerDriver(
+    scheduler,
+    framework,
+    args[0])
+  sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)

Propchange: incubator/mesos/trunk/mpi/mpiexec-mesos.py
------------------------------------------------------------------------------
    svn:executable = *



Mime
View raw message