incubator-tashi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan, Michael P" <michael.p.r...@intel.com>
Subject RE: Support of Tashi
Date Fri, 18 Sep 2009 15:54:50 GMT
First off, I'd like to thank you very much for your interest and involvment in Tashi.  I've
tried to respond to the specific issues listed:

Priority 1: Authentication/Encryption

I agree this is a high priority item.  A student working over the summer (Michael Wang) modified
Tashi to use RPyC, which provides a user authentication mechanism as well as a secure channel
for requests.  We haven't done extensive testing, but it appears to provide most of what we
want.  It requires some manual configuration at this point, but I'd like to know if for some
reason this is not a satisfactory approach in general for you before I dig deeper.

Priority 2: Network configuration

I agree that this will likely be an ongoing issue.  In our current infrastructure, we have
a DMZ (with 10 public IPs), a general network, and several private VLANs.  We have assumed
control of the DMZ and the general network, but are having users run their own DNS and DHCP
servers in the private VLANs.  I agree completely with the strategy you suggest -- implement
what we need now with an eye toward future extensions.

Priority 3: Site-specific plugins

This is similar to the last point in that we need to implement what we need now while trying
to keep it extensible, but we won't really know all the requirements until more sites are
using Tashi.

Priority 4: VM scheduling model

The basic scheduler (primitive.py) doesn't do much in this space.  We have, however, implemented
a bridge that allows the use of Maui, a resource scheduler, to control VM creation.  This
should allow the use of more advanced scheduling techniques for things like priorities and
quotas.  A basic system of billing would be possible by using this as well, but it would seem
advantageous to have Tashi support a more direct and systematic form of billing.

Priority 5: Physical boot

We have looked at this a fair bit and there seems to be two basic conclusions we have drawn.
 One is that if we properly isolate physical machines (VLANs and routing and other techniques),
we can limit a rogue DHCP server from affecting the entire cluster and have it only affect
a private VLAN (presumably owned and managed by one user or group).  We are working with others
at HP on a project called PRS that will is responsible for the physical booting.  It will
automatically reprogram switches and other networking infrastructure to limit the access of
an end-host and setup servers to perform the PXE booting.  The other conclusion is that, in
general, current hardware lacks the ability to limit modifications to the BIOS and other system
hardware by a priveledged user in the operating system.

We have thought of dealing with these problems by, as mentioned above, limiting the impact
using network isolation and disincentivizing the later problem/bahavior by using a billing
system that will bill a user until a machine is returned (ie. it PXE boots a base image we
provide).  And as you mention, this feature is just beginning to materialize.

Priority 6: Multi-VM job control

This may be solvable by using Maui as the scheduler, but I agree that this is a scheduler-only
change and shouldn't be tremendously difficult with respect to Tashi (synchronized operations
are always a little challenging in a cluster).

To respond to you rquestion about joining and proposing and developing solutions, I'd like
to warmly welcome you to do so.  I have sent this email to the tashi-dev mailing list and
BCC'd all of the original recipients (to avoid exposing email addresses).  I'd be happy to
continue any discussion on the mailing list.  You can join the mailing list by emailing tashi-dev-subscribe@incubator.apache.org.
 Additionally, if you have code, patches, ideas, or documentation to contribute, sending it
to the list is the right way to get it applied to SVN.  The basic way forward is for us to
continue this discussion by exchaning ideas and code.  Assuming you want to get even more
involved, we could look into making one or more of you committers after some further interactions.

In terms of testing, I haven't written much documentation.  The procude works roughly as follows:

1. Install on a small testbed (2-3) nodes and test all basic features as well as any new functionality.
2. If the change affects the cluster manager, stop the scheduler, backup the CM's data, update
the software and restart the CM and scheduler on the production cluster.
3. Incrementally update the software on the nodes, simply killing the node manager process
and restarting it (everything should automatically reload).  Again, this is on our production
cluster.

Obviously, in the cases where the data format checkpointed by the node manager changed, that
must be updated between the exit and the restart.

Again, thank you very much for your time and energy.  I appreciate the detailed analysis of
the current system and look forward to working with you in the future.

- Michael

-----Original Message-----
From: Sheen, Robert 
Sent: Thursday, September 17, 2009 6:02 AM
To: Ryan, Michael P
Subject: RE: Support of Tashi

Dear Ryan,

	This is Robert Sheen at Taiwan HP. I would like to ask your support to help III to solve
the questions of Tashi, III is planning to join Open Cirrus and have already installed Tashi
on their site. Your help will be very helpful to speedup the collaboration, thanks in advance.
 

	III Dr. Hsieh as in the cc list. After his studying the Tashi slides, there are bellows known
issues, Dr Hsieh would like to know what is the current status of these known issues, and
if III want to join to propose and develop solutions for these known issues, how to proceed?
What procedure need to take? Thanks!

	2nd question is III is drafting a test plan for the Tashi environment. Mr., Chen would like
to ask help on any exist test procedure document to reference. Thanks!


•    Priority 1: Authentication/Encryption

–    Virtual cluster owner authentication has not been resolved in the current Tashi implementation

–    Plan: select a user account management scheme soon and implement (probably via SSL)

•    Priority 2: Network configuration

–    Site-specific network configuration will probably be an on-going thorny issue.  How
many global IP addresses are available?  Which private subnets are available?  Do the physical
cluster owners have control over local DHCP/DNS servers? Etc.

–    Plan: implement something that works for the first few Tashi sites, architect the site-specific
plugin to enable modification, adapt as new needs surface

•    Priority 3: Site-specific plugins 

–    Are agents capable of doing all of the site-specific logic needed to create and manage
VMs?

–    Plan: Solicit feedback from partners to determine for which steps in VM creation/activation
customization is critical

•    Priority 4: VM scheduling model

–    Tashi does not currently have a well-integrated scheduler that supports VM priorities,
quotas, billing, etc.

–    Plan: Implement features on “as needed” basis

•    Priority 5: Physical boot

–    A number of security concerns have surfaced here if the owner of the physically-booted
machine is not completely trusted (or if a trusted, but naïve, owner’s machine becomes
compromised). What if a DHCP server is started that competes with the cluster’s server?
 If we rely on PXE boot to regain control, can we prevent a physical owner from reprogramming
the BIOS to prevent PXE boot?  What are the best monitoring/control options? Etc.

–    Plan: do not offer physical boot in Tashi until security model is better understood

•    Priority 6: Multi-VM job control

–    The current scheduling agent activates VMs one at a time.  A transactional mechanism
needs to be added that only starts a VM group if there is room to accommodate the entire group
and enables easy tear-down if any portion of the group fails

–    Plan: Extend scheduler with such a feature, should be straight-forward

 

Best Rgrds,
Robert Sheen
沈 仲 杰
HP TSG Pre-Sales
Solution Manager
Mime
View raw message