incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harvey Feng" <h.f...@berkeley.edu>
Subject Re: Review Request: Updates and additions to the MPI framework
Date Fri, 01 Jun 2012 10:02:13 GMT


> On 2012-05-25 18:12:45, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 61
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61>
> >
> >     I've been puzzling over why the return is an issue with this revision since
it wasn't with earlier revisions, and I believe it's due to the fact that the return is within
the for loop. Before, this return was outside of the loop, so we'd always complete the loop.
Once the loop completed, we'd check if we had enough mpds, and if so, we'd launch. With this
revision, we may never get a chance to complete the loop and thus never check if we have enough
resources. I think a break would solve the problem, provided it's acceptable not to respond
to all of the offers. Otherwise, we need to make sure to decline all offers.
> 
> Harvey Feng wrote:
>     You're right, I missed this :(. A continue would make sure we decline all the offers
if enough tasks are launched.
> 
> Jessica wrote:
>     Yes; however, after further investigation, I've discovered that completing the function
results in threading.Thread(target=mpiexec).start() getting called multiple times. So I guess
it either needs to go back to how it was before (with the return before the loop) or there
needs to be some kind of flag that indicates whether the thread has already been launched.
(I used the flag approach, and it worked fine, but maybe you have a better idea.)

Fixed by adding a flag.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()'
to statusUpdate() so that it stops when all tasks have been finished, which occurs when the
executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd
processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor
termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't.

>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated
mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS
X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message