incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "BigtopProposal" by TomWhite2
Date Tue, 14 Jun 2011 04:18:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "BigtopProposal" page has been changed by TomWhite2:

New page:
= Bigtop - Apache Hadoop Ecosystem Packaging and Test =

== Abstract ==

Bigtop - a project for the development of packaging and tests of the Hadoop ecosystem.

== Proposal ==

The primary goal of Bigtop is to build a community around the packaging and interoperability
testing of Hadoop-related projects. This includes testing at various levels (packaging, platform,
runtime, upgrade, etc...) developed by a community with a focus on the system as a whole,
rather than individual projects.

Build, packaging and integration test code that depends upon official releases of the Apache
Hadoop-related projects (HDFS, MapReduce, HBase, Hive, Pig, ZooKeeper, etc...) will be developed
and released by this project. As bugs and other issues are found we expect these to be fixed

== Background ==

The initial packaging and test code for Bigtop was developed by Cloudera to package projects
from the Apache Hadoop ecosystem and provide a consistent, inter-operable framework.

== Rationale ==

Hadoop defines itself as:

The Apache Hadoop project develops open-source software for reliable, scalable, distributed
computing. Hadoop includes these subprojects:

* Hadoop Common: The common utilities that support the other Hadoop subprojects.
* HDFS: A distributed file system that provides high throughput access to application data.
* MapReduce: A software framework for distributed processing of large data sets on compute

There are also several other Hadoop-related projects at Apache.  Some TLP examples include
HBase, Hive, Mahout, ZooKeeper, and Pig.  There are also several new projects in the Incubator
such as HCatalog and Sqoop.

There is limited interoperability testing done the projects themselves. The intent of this
project is to build a community where the projects are brought together, packaged, and tested
for interoperability.

Projects such as Apache Whirr (incubating), which deploy and use a collection of Hadoop-related
projects, would benefit from the interoperability testing done by Bigtop, rather than picking
and testing project combinations themselves.

== Initial Goals ==

Much of the code for Bigtop has been released by Cloudera under the Apache 2.0 license for
over two years.

Some current goals include:
 * create a set of packages for the Hadoop ecosystem, over a wide range of platforms
 * interoperability test these projects
 * document project sets that are known to work well together

Bigtop’s release artifact would consist of a single tarball of packaging and test code that,
when built, would produce source and binary Linux packages for the upstream projects.

= Current Status =

== Meritocracy ==

Bigtop was originally developed and released as an open source packaging infrastructure, CDH,
by Cloudera.

== Community ==

The community is primarily the original developers at Cloudera, however a number of contributions
to the packaging specifications have been accepted from outside contributors. Growing a diverse
community is the main reason to bring Bigtop to the Apache Incubator.

== Core Developers ==

The core developers for Bigtop project are:
 * Andrew Bayer has extensive expertise with build tools, specifically Jenkins continuous
integration and Maven.
 * Peter Linnell has contributed to the RPM packaging.
 * Bruno Mahé has overseen much of the development of the RPM and Debian packaging system.
 * Roman Shaposhnik designed and implemented the system testing framework.

Many of the committers to the Bigtop project have contributed towards Hadoop or related Apache
projects (Alejandro Abdelnur, Eli Collins, Patrick Hunt, John Sichi, Michael Stack, Tom White)
and are familiar with Apache principals and philosophy for community driven software development.

== Alignment ==

We expect projects in Bigtop to be drawn from Hadoop and related projects at Apache. Bigtop
will complement these projects (Hadoop, Pig, Hive, HBase, etc...) by providing an environment
for contributors interested in building more complex data processing pipelines to work together
integrating more than a single project into a well tested whole.

= Known Risks =

== Orphaned Products ==

The contributors are leading vendors of Hadoop-based technologies and have a long standing
in the Hadoop community. There is minimal risk of this work becoming non-strategic and the
contributors are confident that a larger community will form within the project in a relatively
short space of time.

== Inexperience with Open Source ==

All code developed for Bigtop has been open sourced under the Apache 2.0 license. Most committers
of Bigtop project are intimately familiar with the Apache model for open-source development
and are experienced with working with new contributors.

== Homogeneous Developers ==

The initial set of committers is from a small set of organizations and numerous existing Apache
projects. We expect that once approved for incubation, the project will attract new contributors
from more organizations and will thus grow organically.

== Reliance on Salaried Developers ==

It is expected that Bigtop will be developed on salaried and volunteer time, although all
of the initial developers will work on it mainly on salaried time.

== Relationships with Other Apache Products ==

Bigtop depends upon other Apache Projects including Apache Hadoop, Apache HBase, Apache Hive,
Apache Pig, Apache Zookeeper, Apache Thrift, Apache Avro. The build system uses Apache Ant
and Apache Maven.

== An Excessive Fascination with the Apache Brand ==

We would like Bigtop to become an Apache project to further foster a healthy community of
contributors and consumers around interoperability, testing and packaging of Hadoop projects.
Since Bigtop directly interacts with many Apache Hadoop-related projects and solves important
problems of many Hadoop users, residing in the the Apache Software Foundation will increase
interaction with the larger community.

= Documentation =

 * Bigtop will develop its own documentation detailing how to build, test, install, configure
and debug.

= Initial Source =


== Source and Intellectual Property Submission Plan ==

 * The initial source is already licensed under the Apache License, Version 2.0.

== External Dependencies ==

The required external dependencies are all Apache License or compatible licenses.

== Cryptography ==

Bigtop doesn't use cryptography itself, however Hadoop projects use standard APIs and tools
for SSH and SSL communication where necessary.

= Required  Resources =

== Mailing lists ==

 * bigtop-private (with moderated subscriptions)
 * bigtop-dev
 * bigtop-commits
 * bigtop-user

== Subversion Directory ==

== Issue Tracking ==


== Other Resources ==

The existing code already has unit and integration tests so we would like a Jenkins instance
to run them whenever a new patch is submitted. This can be added after project creation.

= Initial Committers =

 * Alejandro Abdelnur (tucu at cloudera dot com)
 * Andrew Bayer (abayer at cloudera dot com)
 * Eli Collins (eli at cloudera dot com)
 * Travis Crawford (travis at twitter dot com)
 * Bruno Mahé (bruno at cloudera dot com)
 * Patrick Hunt (phunt at apache dot org)
 * Peter Linnell (plinnell at cloudera dot com)
 * James Page ( at canonical dot com)
 * Roman Shaposhnik (rvs at cloudera dot com)
 * John Sichi (jvs at apache dot org)
 * Michael Stack (stack at apache dot org)
 * Tom White (tomwhite at apache dot org)

= Affiliations =

 * Alejandro Abdelnur, Cloudera
 * Andrew Bayer, Cloudera
 * Eli Collins, Cloudera
 * Travis Crawford, Twitter
 * Bruno Mahé, Cloudera
 * Patrick Hunt, Cloudera
 * Peter Linnell, Cloudera
 * James Page, Canonical
 * Roman Shaposhnik, Cloudera
 * John Sichi, Facebook
 * Michael Stack, StumbleUpon
 * Tom White, Cloudera

= Sponsors =

== Champion ==

 * Patrick Hunt

== Nominated Mentors ==

 * Patrick Hunt
 * Tom White

== Sponsoring Entity ==

 * Apache Incubator PMC

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message