incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "TashiProposal" by DavidOHallaron
Date Thu, 10 Jul 2008 14:53:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by DavidOHallaron:
http://wiki.apache.org/incubator/TashiProposal

------------------------------------------------------------------------------
  
  A proposal to the Apache Software Foundation Incubator PMC by
  
+ David O'Hallaron, Michael Ryan, Michael Kozuch, Steven Schlosser, Greg Ganger, Garth Gibson,
Julio Lopez, Michael Strouken, Jim Cipar, Wittawat Tantisiriroj, Doug Cutting, Jay Kistler,
Thomas Kwan
- 
- 
- David O'Hallaron, Michael Ryan, Michael Kozuch, Steven Schlosser, Greg
- 
- Ganger, Garth Gibson, Julio Lopez, Michael Strouken, Jim Cipar,
- 
- Wittawat Tantisiriroj, Doug Cutting, Jay Kistler, Thomas Kwan
  
  
  
@@ -22, +16 @@

  
  Tashi is a cluster management system for cloud computing on Big Data.
  
- 
- 
  == 2. Proposal ==
  
+ The Tashi project aims to build a software infrastructure for cloud computing on massive
internet-scale datasets (what we call Big Data). The idea is to build a cluster management
system that enables the Big Data that are stored in a cluster/data center to be accessed,
shared, manipulated, and computed on by remote users in a convenient, efficient, and safe
manner.  The system aims to provide the following basic capabilities:
  
- The Tashi project aims to build a software infrastructure for cloud
+ (a) On-demand provisioning of storage and compute resources. Users request a number of compute
nodes, which can be either virtual or physical machines, and a set of disk images to boot
up on the nodes. In response they receive their own persistent logical cluster of compute
and storage nodes, which they can then manage and use.
  
- computing on massive internet-scale datasets (what we call Big Data).
+ (b) Extensible end-to-end system management. Tashi will define open non-proprietary interfaces
for management tasks such as observation, inference, planning, and actuation. This will keep
the system vendor-neutral and allow different research and development groups to plug in different
implementations of different management modules.
  
- The idea is to build a cluster management system that enables the Big
+ (c) Cooperative storage and compute management.  The system will define new non-proprietary
interfaces and methods that will allow compute and storage management to work together in
concert.
  
- Data that are stored in a cluster/data center to be accessed, shared,
+ (d) Flexible storage models. The system will support a range of different storage models,
such as network-attached storage, per-node storage, and hybrids, to allow developers, researchers,
and large scale cluster/data center operators to experiment with different kinds of file systems.
  
- manipulated, and computed on by remote users in a convenient,
+ (e) Flexible machine models. The system will support different machine models.  In particular,
it will be VMM-agnostic, able to run different virtual machine monitors such as KVM and Xen.
Also, in order to address the cluster squatting problem (when clusters are balkanized by users
who reserve and hold nodes for their exclusive use) the system will support a novel bi-model
booting capability, in which virtual machine and physical machine instances can boot from
the same disk image.
  
- =efficient, and safe manner.  The system aims to provide the following
+ == 3. Rationale and Approach ==
  
- basic capabilities:
+ Digital media, pervasive sensing, web authoring, mobile computing, scientific and medical
instruments, physical simulations, and virtual worlds are all delivering vast new datasets
relating to every aspect of our lives. A growing fraction of this Big Data is going unused
or being underexploited due to the overwhelming scale of the data involved.  Effective sharing,
understanding, and use of this new wealth of raw information poses one of the great challenges
for the new century.
  
+ In order to compute on this emerging Big Data, many research and development groups are
purchasing their own racks of compute and storage servers. The goal of the Tashi project is
to develop a layer of utility software that turns these raw racks of servers into easily managed
cloud computers that will allow remote users to share and explore their Big Data.
  
+ To our knowledge there are no open source projects addressing cluster management for Big
Data applications. We need a project such as Tashi for a number of reasons: (1) No cloud computing
cluster management systems have tackled the problem of having both compute and storage management
working together in concert, which we believe will be necessary to support Big Data. (2) We
need non-proprietary interfaces for cloud computing, and open source is the way to develop
these. For example, Google's new App Engine and Amazon's web services require people to build
to proprietary API's, so that their applications are no longer vendor neutral, but are tied
to a particular service provider. (3) We need an extensible system that can serve as a platform
to stimulate research in cluster management for cloud computing.
- 
- (a) On-demand provisioning of storage and compute resources. Users
- 
- request a number of compute nodes, which can be either virtual or
- 
- physical machines, and a set of disk images to boot up on the
- 
- nodes. In response they receive their own persistent logical cluster
- 
- of compute and storage nodes, which they can then manage and use.
- 
- 
- 
- (b) Extensible end-to-end system management. Tashi will define open
- 
- non-proprietary interfaces for management tasks such as observation,
- 
- inference, planning, and actuation. This will keep the system
- 
- vendor-neutral and allow different research and development groups to
- 
- plug in different implementations of different management modules.
- 
- 
- 
- (c) Cooperative storage and compute management.  The system will
- 
- define new non-proprietary interfaces and methods that will allow
- 
- compute and storage management to work together in concert.
- 
- 
- 
- (d) Flexible storage models. The system will support a range of
- 
- different storage models, such as network-attached storage, per-node
- 
- storage, and hybrids, to allow developers, researchers, and large
- 
- scale cluster/data center operators to experiment with different kinds
- 
- of file systems.
- 
- 
- 
- (e) Flexible machine models. The system will support different machine
- 
- models.  In particular, it will be VMM-agnostic, able to run different
- 
- virtual machine monitors such as KVM and Xen. Also, in order to
- 
- address the cluster squatting problem (when clusters are balkanized by
- 
- users who reserve and hold nodes for their exclusive use) the system
- 
- will support a novel bi-model booting capability, in which virtual
- 
- machine and physical machine instances can boot from the same disk
- 
- image.
- 
- 
- 
- == 3. Rationale and Approach
-  ==
- 
- 
- Digital media, pervasive sensing, web authoring, mobile computing,
- 
- scientific and medical instruments, physical simulations, and virtual
- 
- worlds are all delivering vast new datasets relating to every aspect
- 
- of our lives. A growing fraction of this Big Data is going unused or
- 
- being underexploited due to the overwhelming scale of the data
- 
- involved.  Effective sharing, understanding, and use of this new
- 
- wealth of raw information poses one of the great challenges for the
- 
- new century.
- 
- 
- 
- In order to compute on this emerging Big Data, many research and
- 
- development groups are purchasing their own racks of compute and
- 
- storage servers. The goal of the Tashi project is to develop a layer
- 
- of utility software that turns these raw racks of servers into easily
- 
- managed cloud computers that will allow remote users to share and
- 
- explore their Big Data.
- 
- 
- 
- To our knowledge there are no open source projects addressing cluster
- 
- management for Big Data applications. We need a project such as Tashi
- 
- for a number of reasons: (1) No cloud computing cluster management
- 
- systems have tackled the problem of having both compute and storage
- 
- management working together in concert, which we believe will be
- 
- necessary to support Big Data. (2) We need non-proprietary interfaces
- 
- for cloud computing, and open source is the way to develop these. For
- 
- example, Google's new App Engine and Amazon's web services require
- 
- people to build to proprietary API's, so that their applications are
- 
- no longer vendor neutral, but are tied to a particular service
- 
- provider. (3) We need an extensible system that can serve as a
- 
- platform to stimulate research in cluster management for cloud
- 
- computing.
- 
- 
  
  The Tashi system is targeted at two (not always distinct) communities:
  
- (1) As a production system for organizations who want to offer medium
+ (1) As a production system for organizations who want to offer medium to large scale clusters
to their users. For example, many companies and university departments are purchasing such
clusters, and a system like Tashi would help them provide their users with access to the cycles
and storage in the clusters. 
  
- to large scale clusters to their users. For example, many companies
+ (2) As an extensible research platform for distributed systems researchers.
  
- and university departments are purchasing such clusters, and a system
+ The approach for the project is to build on existing cluster management work pioneered by
projects such as Usher (UCSD), Cluster on Demand (Duke), and EC2/S3 (Amazon), and then develop
the new capabilities that will be required to support Big Data cloud computing.
  
- like Tashi would help them provide their users with access to the
+ == 4. Need for a Community Effort ==
  
- cycles and storage in the clusters. (2) As an extensible research
+ A number of events at Yahoo, Carnegie Mellon, and Intel Research Pittsburgh motivated the
development of Tashi and convinced us to work together in the context of an open-source community:
  
- platform for distributed systems researchers.
+ (a) In 2006 the Parallel Data Lab (PDL) at Carnegie Mellon built a cluster of 400 nodes
from industry donations, with a goal of creating a "Data Center Observatory" that would allow
systems researchers to study and monitor applications running on the cluster. This dream has
been slow to materialize because of the cost and complexity of supporting and managing multiple
applications and systems groups.
  
+ (b) In Fall 2007, Yahoo began offering access to their M45 research cluster to researchers
at Carnegie Mellon, and in order to support M45 as well as their own internal production clusters,
began to develop some cloud computing infrastructure on their own.
  
+ (c) In Fall 2007, Intel Research Pittsburgh purchased a moderate-sized 100-node cluster
and made it available to applications groups at Carnegie Mellon working on various Big Data
applications such as computational photography, machine translation, automatic speech 
+ recognition, and event detection in spatio-temporal video streams. Provisioning and scheduling
the cluster in the face of so many different application demands has proven to be difficult.
  
- The approach for the project is to build on existing cluster
+ The difficulties of managing and provisioning these different clusters convinced us that
the problem was too big for any one of us to solve completely on our own, and that we needed
to band together create a open-source community effort focused on developing a single software
system.
  
- management work pioneered by projects such as Usher (UCSD), Cluster on
+ Another important reason to develop an open-source community around Tashi is that we need
non-proprietary vendor-neutral APIs for the 
+ emerging area of cloud computing, and open source is the best way to achieve that.
  
- Demand (Duke), and EC2/S3 (Amazon), and then develop the new
+ == 5. Known Risks ==
  
- capabilities that will be required to support Big Data cloud
+ ''Commitment to future development.'' The risk of the developers abandoning the project
is small, mainly because they all own and manage moderate to large scale clusters, and desperately
need something like Tashi to provision and manage those clusters. We also need a system like
Tashi to serve as an extensible platform for our research.
  
- computing.
+ ''Experience with open source.'' Yahoo has had a significant and positive experience with
the Apache Software Foundation (ASF) and Hadoop. While Intel and Carnegie Mellon have developed
some non-ASF style open source projects in the past (e.g., Internet Suspend/Resume, OpenDHT,
and OpenDiamond), they have no experience with ASF-style open source communities. However,
they hope to benefit from Yahoo's considerable experience in this area.
  
+ ''Diversity of developer community.'' The initial code base for Tashi was developed by a
single research programmer, Michael Ryan, at Intel Research Pittsburgh. An important reason
for putting Tashi in the incubator is to expand the set of developers to include programmers
from Carnegie Mellon and Yahoo, initially, and later, hopefully, from other groups such as
Usher at UCSD, cluster-on-demand from Duke University, and the RAD Lab at Berkeley.
  
+ ''Relationship to other Apache projects.'' There are no Apache projects such as Tashi that
focus on systems support for cloud domputing. However, the Tashi project is closely related
to Hadoop/HDFS. The VM-based provisioning of Tashi will subsume the now 
+ deprecated sub-clustering functionality of Hadoop-on-demand. The Tashi prototype uses HDFS
to host the cluster boot images. Also, we expect that many Tashi logical clusters will run
Hadoop jobs.
  
+ ''Reasons that Tashi is an ASF project.'' There are three main reasons for developing Tashi
through Apache rather than, say, SourceForge. (1) Our Yahoo partner has had a very positive
experience with the Hadoop project. (2) We recognize the need to build a strong developer
community, and Apache is centered around building such communities. (3) The ASF also offers
substantial legal oversight that makes it attractive for cross-organizational collaborative
efforts such as Tashi.  With Sourceforge, for example, you have few guarantee about the title
of the code.  Thus, people can easily post code they don't own, and/or change the license
terms of other open source code that they include in their projects.  So users of code from
Sourceforge must be wary.  On the other hand, Apache vets all contributions, keeping signed
documents from every committer on file, etc.
- == 4. Need for a Community Effort
-  ==
  
- 
- A number of events at Yahoo, Carnegie Mellon, and Intel Research
- 
- Pittsburgh motivated the development of Tashi and convinced us to work
- 
- together in the context of an open-source community:
- 
- 
- 
- (a) In 2006 the Parallel Data Lab (PDL) at Carnegie Mellon built a
- 
- cluster of 400 nodes from industry donations, with a goal of creating
- 
- a "Data Center Observatory" that would allow systems researchers to
- 
- study and monitor applications running on the cluster. This dream has
- 
- been slow to materialize because of the cost and complexity of
- 
- supporting and managing multiple applications and systems groups.
- 
- 
- 
- (b) In Fall 2007, Yahoo began offering access to their M45 research
- 
- cluster to researchers at Carnegie Mellon, and in order to support M45
- 
- as well as their own internal production clusters, began to develop
- 
- some cloud computing infrastructure on their own.
- 
- 
- 
- (c) In Fall 2007, Intel Research Pittsburgh purchased a moderate-sized
- 
- 100-node cluster and made it available to applications groups at
- 
- Carnegie Mellon working on various Big Data applications such as
- 
- computational photography, machine translation, automatic speech
- 
- recognition, and event detection in spatio-temporal video
- 
- streams. Provisioning and scheduling the cluster in the face of so
- 
- many different application demands has proven to be difficult.
- 
- 
- 
- The difficulties of managing and provisioning these different clusters
- 
- convinced us that the problem was too big for any one of us to solve
- 
- completely on our own, and that we needed to band together create a
- 
- open-source community effort focused on developing a single software
- 
- system.
- 
- 
- 
- Another important reason to develop an open-source community around
- 
- Tashi is that we need non-proprietary vendor-neutral APIs for the
- 
- emerging area of cloud computing, and open source is the best way to
- 
- achieve that.
- 
- 
- 
- == 5. Known Risks
-  ==
- 
- 
- Commitment to future development. The risk of the developers
- 
- abandoning the project is small, mainly because they all own and manage
- 
- moderate to large scale clusters, and desperately need something
- 
- like Tashi to provision and manage those clusters.
- 
- 
- 
- Experience with open source. Yahoo has had a significant and positive
- 
- experience with the Apache Software Foundation (ASF) and Hadoop.
- 
- While Intel and Carnegie Mellon have developed some non-ASF style open
- 
- source projects in the past (e.g., Internet Suspend/Resume, OpenDHT,
- 
- and OpenDiamond), they have no experience with ASF-style open source
- 
- communities. However, they hope to benefit from Yahoo's considerable
- 
- experience in this area.
- 
- 
- 
- Diversity of developer community. The initial code base for Tashi was
- 
- developed by a single research programmer, Michael Ryan, at Intel
- 
- Research Pittsburgh. An important reason for putting Tashi in the
- 
- incubator is to expand the set of developers to include programmers
- 
- from Carnegie Mellon and Yahoo, initially, and later, hopefully, from
- 
- other groups such as Usher at UCSD, Cluster-on-demand from Duke
- 
- University, and the RAD Lab at Berkeley.
- 
- 
- 
- Relationship to other Apache projects. There are no Apache projects
- 
- such as Tashi that focus on systems support for cloud
- 
- computing. However, the Tashi project is closely related to
- 
- Hadoop/HDFS. The VM-based provisioning of Tashi will subsume the now
- 
- deprecated sub-clustering functionality of Hadoop-on-demand. The Tashi
- 
- prototype uses HDFS to host the cluster boot images. Also, we expect
- 
- that many Tashi logical clusters will run Hadoop jobs.
- 
- 
- 
- Reasons that Tashi is an ASF project. There are three main reasons for
- 
- developing Tashi through Apache rather than, say, SourceForge. (1) Our
- 
- Yahoo partner has had a very positive experience with the Hadoop
- 
- project. (2) We recognize the need to build a strong developer
- 
- community, and Apache is centered around building such communities.
- 
- (3) The ASF also offers substantial legal oversight that makes it
- 
- attractive for cross-organizational collaborative efforts such as
- 
- Tashi.  With Sourceforge, for example, you have few guarantee about
- 
- the title of the code.  Thus, people can easily post code they don't
- 
- own, and/or change the license terms of other open source code that
- 
- they include in their projects.  So users of code from Sourceforge
- 
- must be wary.  On the other hand, Apache vets all contributions,
- 
- keeping signed documents from every committer on file, etc.
- 
- 
- 
- == 6. Related Work
+ == 6. Related Work ==
-  ==
- 
  
  A small sampling of some closely related work:
  
+ [1] M. McNett, D. Gupta, A. Bahdat, G. Voelker, "Usher: An Extensible Framework for Managing
Clusters of Virtual Machines", Proceedings of the 21st Large Installation System Administration
Conference (LISA 07), 2007.
  
+ [2] D. Irwin, J. Chase, L. Grit, A. Yumerefendi, D. Becker, "Sharing Networked Resources
with Brokered Leases", Usenix, 2006.
  
- [1] M. McNett, D. Gupta, A. Bahdat, G. Voelker, "Usher: An Extensible
+ [3] J. Chase, D. Irwin, L. Grit, J. Moore, S. Sprenkle, "Dynamic Virtual Clusters in a Grid
Site Manager", HPDC, 2003.
  
+ [4] S. Garfinkel, "An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS",
Tech Report TR-08-07, School for Engineering and Applied Sciences, Harvard University, 2007.
- Framework for Managing Clusters of Virtual Machines", Proceedings of
- 
- the 21st Large Installation System Administration Conference (LISA
- 
- 07), 2007.
- 
- 
- 
- [2] D. Irwin, J. Chase, L. Grit, A. Yumerefendi, D. Becker, "Sharing
- 
- Networked Resources with Brokered Leases", Usenix, 2006.
- 
- 
- 
- [3] J. Chase, D. Irwin, L. Grit, J. Moore, S. Sprenkle, "Dynamic
- 
- Virtual Clusters in a Grid Site Manager", HPDC, 2003.
- 
- 
- 
- [4] S. Garfinkel, "An Evaluation of Amazon's Grid Computing Services:
- 
- EC2, S3, and SQS", Tech Report TR-08-07, School for Engineering and
- 
- Applied Sciences, Harvard University, 2007.
  
  
  [5] RedHat oVirt System, http://ovirt.org, 2008
  
+ == 7. Source ==
  
  
- == 7. Source
- ==
+ We have working code, a pre-alpha proof-of-concept prototype that was developed by Michael
Ryan at Intel Research Pittsburgh. The prototype is currently running on the 100-node cluster
there. We will enter the incubator with clean code, developed entirely by Michael Ryan, that
is unencumbered by any licensing issues.
+ 
+ == 8. Required Resources  ==
  
  
- We have working code, a pre-alpha proof-of-concept prototype that was
- 
- developed by Michael Ryan at Intel Research Pittsburgh. The prototype
- 
- is currently running on the 100-node cluster there. We will
- enter the incubator with clean code, developed entirely by Michael
- 
- Ryan, that is unencumbered by any licensing issues.
- 
- 
- 
- 
- 
- 
- == 8. Required Resources
-  ==
- 
- 
- (a) Mailing lists:
+ (a) Proposed Mailing lists:
- 
  tashi-private (with moderated subscriptions)
- 
  tashi-dev
- 
  tashi-commits
- 
  tashi-user
  
- 
- 
  (b) Subversion directory
- 
  http://svn.apache.org/repos/asf/incubator/tashi
  
- 
- 
  (c) Issue tracking:
- 
  Tashi will use JIRA for bug tracking.
  
  
  
- == 9. Initial Committers
+ == 9. Initial Committers ==
-  ==
  
  
+ Initially, we plan to start with one committer each from Carnegie Mellon and Intel Research,
with a Yahoo committer to be determined later:
- Initially, we plan to start with one committer each from Carnegie Mellon
- 
- and Intel Research, with a Yahoo committer to be determined later:
- 
- 
  
  Michael Stroucken (mxs@cmu.edu)
- 
  Michael Ryan (michael.p.ryan@intel.com)
  
  
+ == 10. Sponsors ==
  
- == 10. Sponsors
-  ==
+ (a) Champion: Doug Cutting (cutting@apache.org)
+ (b) Sponsoring entity: Apache Incubator PMC
  
- 
- (a) Champion
- 
- Doug Cutting (cutting@apache.org)
- 
- 
- 
- (b) Sponsoring entity
- 
- Apache Incubator PMC
- 

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message