+1 Regards, Alan On Jan 18, 2010, at 12:17 PM, Justin Erenkrantz wrote: > Per the previous proposal that was sent to general@incubator, I'd like > to now call a vote for accepting OODT into the Incubator. > > [ ] +1 - Accept OODT into the Incubator > [ ] -1 - Do not accept OODT into the Incubator (rationale strongly > desired!) > > Per the proposal below, the mentors for this proposal are myself, > Ross, Jean-Frederic, and Ian. > > Unless there are any extenuating circumstances, I will close this vote > on Thursday morning, January 21. > > Thanks. -- justin > > http://wiki.apache.org/incubator/OODTProposal > > OODT, a grid middleware framework for science data processing, > information integration, and retrieval. > > Abstract > > OODT is a grid middleware framework used on a number of successful > projects at NASA's Jet Propulsion Laboratory/California Institute of > Technology, and many other research institutions and universities, > specifically those part of the: > > National Cancer Institute's (NCI's) Early Detection Research Network > (EDRN) project - over 40+ institutions all performing research into > discovering biomarkers which are early indicators of disease. > > NASA's Planetary Data System (PDS) - NASA's planetary data archive, a > repository and registry for all planetary data collected over the past > 30+ years. > > various Earth Science data processing missions, including > Seawinds/QuickSCAT, the Orbiting Carbon Observatory, the NPP Sounder > PEATE project, and the Soil Moisture Active Passive (SMAP) mission. > > From the OODT website: > > It's middleware for metadata: > > Transparent access to distributed resources > Data discovery and query optimization > Distributed processing and virtual archives > > It's a software architecture: > > Models for information representation > Solutions to knowledge capture problems > Unification of technology, data, and metadata > > Proposal > > OODT is an established open source project, with 9+ years of > existence, and deployment at universities, federal research > institutions, other NASA centers, and the NIH (it won runner-up NASA > software of the year in 2003). It has a strong community of those that > operate and support its growth. Our proposal is to bring OODT into > Apache to strengthen its support and its capabilities even further on > the laurels of Apache's brand and its growing huge community of > developers from all over the world. In short, bringing OODT into > Apache will significantly enhance OODT's widespread use, will likely > improve its codebase, and furthermore will help Apache philosophy and > community spread into OODT's already large community-base reaching > across government, academia and industry. > > OODT will be, to the best of our knowledge, the first grid community > project to bear the Apache brand. By grid technology, we mean a > technology that provides the ability to create virtual organizations, > as originally described by Kesselman and Foster in their seminal paper > on grid computing. OODT provides both computational and data grid > support, and is built with a component-philosophy. OODT includes > components that allow for virtual information integration across > organizations (provided by the Profile, Product and Query server > components), and that allow for distributed data management and > processing across heterogeneous virtual organizations (provided by the > Catalog and Archive Service set of components, including File Manager, > Workflow Manager and Resource Manager). > > Each set of components exist as independently organized Maven2 > projects, that reference each other (where appropriate), forming a > layered set of components and a framework for grid computing. > > Background > > OODT is an established project within JPL and in use at several NASA > centers, as well as univerities, and other government organizations > and industrial collaborations. Chris Mattmann, a JPL employee, and ASF > PMC (Lucene) and Committer (Nutch, Tika), has been working for the > past 2 years on obtaining the necessary permission from JPL to release > OODT into Apache. After initially being stalled, JPL has granted > permission to allow OODT into Apache. > > Through his academic relationship with Justin Erekrantz, Apache > President, and through their collective Ph.D. studies, OODT has been > discussed between Chris and Justin on several occasions, and Justin > offered to help champion OODT into the Apache Incubator when JPL was > ready to release OODT. In December 2009, that permission was granted. > > This proposal is the result of the above efforts and related > discussions. Some alternatives to incubation, like Apache Labs came up > during the discussions but we believe that taking the project to the > Incubator is the best way to start growing a viable Apache-based > community to sustain OODT. Furthermore, given its larger code base and > existing sub-projects, the goal would be for OODT to leverage the > incubator to graduate into Apache's first top-level grid project, > rather than graduate into a sub-project of an existing TLP. > > Rationale > > Grid computing has been around for the past 10 years and has gained > widespread notoriety and attention in industry and academia. > Scientific collaborations are increasingly virtual and require the > capabilities (data and compute) of thousands of computers and > resources that span organizations. There are a number of existing grid > technologies (Globus being the most popular, DSpace, iRODS/Storage > Resource Broker, see this paper for a full study), however Apache has > no current grid technology under its umbrella and world reknown think > tank. Morever, efforts are few and far between in terms of standing up > Apache-based software that is applicable to the scientific community > and grid community outside of use of fine-grained components in these > systems. Other open source organizations (e.g., the Global > Organization for Earth System Science Portals, GO-ESSP) have embraced > the construction of such technology and there is a lot of work going > on, e.g., at NOAA. This proposal aims to remedy this fact and to bring > scientific data management/grid software into the Apache family and > its worldwide community. > > OODT is a widely successful grid project with applicability and > existing deployments across broad-reaching domains (planetary and > earth sciences, cancer research/biomedicine, climate modeling and > atmospheric science, etc.). The marriage of OODT and Apache will > engender OODT's widespread, global use via the Apache brand, and will > make Apache a player in the grid/scientific data community. > > Initial Goals > > The initial goals of the proposed project are: > > Stand up a sustaining Apache-based community around the OODT codebase. > Active relationships and possible cooperation with related projects > and communities. > Refactor and bring up-to-date the OODT profile and product server > components. > Explore various underlying communication substrates. OODT currently > uses REST (via its Web-Grid component). > Create configuration-based OODT deployments. Currently the deployments > are primarily code-based, or the configuration is strewn about the > various sub-components. The goal would be to bring this configuration > under a single umbrella project. The idea would be to create science > data pipelines from configuration. > Explore Python-based client and server implementations of OODT and > implementations in other languages (Ruby). > > Current Status > > Meritocracy > Many of the proposed initial committers are familiar with the > meritocracy principles of Apache, and have already worked on the > various source codebases (contributing via patches, emails, JIRA > issues, and in Mattmann's case, as a Nutch, and Tika committer, and > Lucene PMC member). We will follow the normal meritocracy rules also > with other potential contributors. > > Community > There is an existing, established community of developers and users of > OODT within over 40 centers at NASA, NIH, DOE and academia, however > there is no Apache OODT community as of yet. Our principal goal of > this effort is to leverage the Apache Incubator to grow an Apache > community base (in addition to OODT's existing community), and to > build a self-sustaining community around this shared vision, and > eventual Apache TLP status for OODT. With many sub projects (CAS, > Product/Profile servers, Query Server, Web-grid, commons, etc.), OODT > should attract a broad audience of developers with various interests. > > Core Developers > The initial set of developers comes from NASA JPL, and with various > backgrounds, with different but compatible needs for the proposed > project. JPL is home to data management and grid projects spanning the > domains of cancer research/bioinformatics, earth science, planetary > science, astrophysics, and climate modeling. > > Alignment > As Apache's first grid-based framework will likely be widely used by > various open source, scientific and commercial projects both together > with and independent of other Apache tools. With OODT's existing > community we will also bring developers and organizations outside of > Apache into the Apache ecosystem. > > Known Risks > > Orphaned products > OODT has supported itself through successful deployments at NASA, at > the U.S. National Institutes of Health (NIH), and recently at > DOE-based laboratories and at academic centers. Further, OODT has been > an active participant in IEEE/ACM-based conferences and > meetings/journal publications over the past 9 years. There is active > support on several existing NASA earth science missions, and the team > at JPL is experienced and will continue to champion and develop OODT > in the Apache area. > > Our goal is to take OODT from the early stage of Apache Incubation > into a thriving Apache top-level project, and leverage it in the > existing manner at NASA, the NIH, at DOE, and in academia and > industry. Since OODT is a grid framework, it depends on many external > services and projects, no one of which controls OODT's code-base. > > We feel that the time is ripe to bring OODT into Apache and to grow > the community of developers who maintain OODT. We feel that Incubation > will bring a slew of industry-based developers (and even those in > academia, and government) who have no prior experience with OODT, but > who could use OODT at their jobs and who are attracted to the brand > name and community that Apache brings. We want to attract such > developers to become part of the core OODT development team, and > project management aspect. > > Inexperience with Open Source > All the initial developers have worked on open source before and at > least one (Mattmann) is a committer and PMC members in the Apache > Lucene ecosystem. Sean Kelly is a well-respected Plone committer and > has made several open source contributions over the years to FreeBSD > and other software. Foster, McCleese and Woollard have all contributed > to Apache projects by way of email, mailing lists, issue reporting and > testing. > > Homogenous Developers > The initial developers come from a variety of backgrounds and with a > variety of needs for the proposed framework. > > Reliance on Salaried Developers > All of the proposed initial developers are paid to work on this or > related projects, but the proposed project is not the primary task for > anyone. > > Relationships with Other Apache Products > OODT is related to at least the following Apache projects. None of the > projects is a direct competitor for OODT, but there are many cases of > potential overlap in functionality. > > Apache Lucene - The family of Lucene products that implement search > services are naturally of use in a grid environment such as OODT. In > fact, OODT has integrated with many of these projects (Tika, SOLR and > Lucene-java) already. We see OODT as a grid environment that makes use > of search services. > > Apache UIMA - The UIMA project provides a framework and pluggable > tools for analyzing text content and extracting information. Example > tools include language identification, sentence boundary detection and > "entity extraction" - finding references to people, places and > organizations. OODT is related to UIMA in the sense that it is a > framework to provide pluggable connections to content and information, > but the focus of OODT is on scientific data sets, and additional on > repositories and catalogs/registries that catalog information about > those datasets and that store the physical bits. Further, OODT is a > grid technology, meant to enable the creation of virtual > organizations, which is not UIMA's focus.Finally, OODT contains both > an information integration component, as well as a science data > processing component, which UIMA does not. > > OODT is also related to Apache projects involving databases, such as > the Apache DB project, however scientific data is not limited to > traditional DBMS'es and involves both structured and un-structured > information. However, there is likely much leveraging that can occur > as OODT can be updated to remove Hibernate-like dependencies, and > replace them with Derby-like dependencies. > > A Excessive Fascination with the Apache Brand > All of us are familiar with Apache and have a respect for its brand > and community. Though all of the proposed committers besides Mattmann > have not participated in Apache projects as committers, and PMC > members, many of them (McCleese, Foster, Woollard, Kelly) have > contributed via issue comments, patches, and tests for Apache projects > (including Maven, Tika, SOLR, and Lucene). Furthermore, some of the > proposed committers (Kelly) are major contributors in other open > source communities (e.g., Plone and Python). We feel that the Apache > Software Foundation is a natural home for a project like this. OODT > brings a credible, major grid-based software into the Apache > community, and Apache brings a huge community of eager and world-class > developers to help grow OODT's strengths and applicability across > projects and domains. > > Documentation > There is a wealth of documentation available on OODT. The best > starting point is the existing OODT JPL website (which will be ported > to be sync'ed or just a pointer to the Apache > website)http://oodt.jpl.nasa.gov > > OODT website at JPL > Mattmann's OODT paper that appeared at the 28th International > Conference on Software Engineering in Shanghai, China. > Crichton's seminal OODT paper appearing at the CODATA conference at > the U.S. National Academies of Science in 2000. > Google Scholar search on OODT. > Standards and conventions related to OODT include the Dublin Core > metadata set, ISO/IEC 11179, the HTTP 1.1 RFC, Grid-based standards > including the Open Grid Services Architecture (OGSA), and standards > for science data formats including Heirarchical Data Format (HDF), > netCDF and OPeNDAP. > > Initial Source > OODT will start with seed code donated by NASA JPL via Mattmann and > the rest of the initial committers. > > Source and Intellectual Property Submission Plan > All seed code and other contributions will be handled through the > normal Apache contribution process. Mattmann has been authorized by > NASA JPL to lead the contribution of OODT into the Incubator via his > existing Apache CLA. > > We will also contact other related efforts for possible cooperation > and contributions. > > External Dependencies > OODT depends on a number of external connector libraries with various > licensing conditions. An initial list of such dependencies (taken from > one of the OODT sub-components, the CAS file manager) is shown below. > > Library | License > commons-codec | AL v2 > commons-dbcp | AL v2 > commons-httpclient | AL v2 > commons-io | AL v2 > commons-pool | AL v2 > cas-metadata | (to be AL v2) > edm-commons | (to be AL v2) > hsqldb | LGPL v2.1 > jug-asl | AL v2 > lucene-core | AL v2 > xmlrpc | AL v2 > > There are also some LGPL components that would be useful. Whether and > how such dependencies could be handled will be discussed during > incubation. No such dependencies will be added to the project before > the legal implications have been cleared. Existing LGPL dependencies, > such as hsqldb above for the CAS file manager, will be removed and a > suitable ASL friendly alternative will be investigated and used to > replace the LGPL dependencies. > > Cryptography > OODT itself will not use cryptography, but it is possible that some of > the external product or profile server or CAS libraries will include > cryptographic code to handle features present in various science data > formats. The current OODT code base relies on Apache Tika which > contains an export control statement regarding cryptographic code per > Apache policy. We will follow a similar approach with OODT. Mattmann > led this effort in Apache Nutch and saw Jukka Zitting lead this effort > in Apache Tika, so he is familiar with this process. > > Required Resources > Mailing lists > > oodt-dev@incubator.apache.org > oodt-commits@incubator.apache.org > oodt-private@incubator.apache.org > Subversion Directory > > https://svn.apache.org/repos/asf/incubator/oodt > Issue Tracking > > JIRA OODT (OODT) > Other Resources > > OODT Wiki http://cwiki.apache.org/OODT > > Initial Committers > > Name | Email | Affiliation | CLA > Chris A. Mattmann | mattmann at apache dot org | NASA Jet Propulsion > Laboratory | yes > Daniel J. Crichton | crichton at jpl dot nasa dot gov | NASA Jet > Propulsion Laboratory | no > Paul Ramirez | pramirez at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Sean Kelly | kelly at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | yes > Sean Hardman | shardman at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Andrew F. Hart | ahart at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Joshua Garcia | joshua at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > David Woollard | woollard at jpl dot nasa dot gov | NASA Jet > Propulsion Laboratory | yes > Brian Foster | bfoster at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Sean McCleese | smcclees at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > > Sponsors > > Champion > Justin Erenkrantz (jerenkrantz at apache dot org) > > Nominated Mentors > Justin Erenkrantz (jerenkrantz at apache dot org) > Ross Gardler (rgardler at apache dot org) > Jean-Frederic Clere (jfclere at apache dot org) > Ian Holsman (ianh at apache dot org) > > Sponsoring Entity > Apache Incubator > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org