Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 25923 invoked from network); 6 Jan 2010 09:56:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jan 2010 09:56:36 -0000 Received: (qmail 24600 invoked by uid 500); 6 Jan 2010 09:56:35 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 24387 invoked by uid 500); 6 Jan 2010 09:56:34 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 24377 invoked by uid 99); 6 Jan 2010 09:56:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 09:56:34 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [82.71.204.226] (HELO cpanelsmarthost2.zen.co.uk) (82.71.204.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 09:56:24 +0000 Received: from [82.71.204.9] (helo=zencphosting06.zen.co.uk) by cpanelsmarthost2.zen.co.uk with esmtp (Exim 4.63) (envelope-from ) id 1NSScU-0002f2-4v for general@incubator.apache.org; Wed, 06 Jan 2010 09:56:02 +0000 Received: from client0375.vpn.ox.ac.uk ([129.67.117.119]) by zencphosting06.zen.co.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1NSScT-0001Nm-Ie for general@incubator.apache.org; Wed, 06 Jan 2010 09:56:02 +0000 Message-ID: <4B445E31.9030606@apache.org> Date: Wed, 06 Jan 2010 09:56:01 +0000 From: Ross Gardler User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.5) Gecko/20091204 Lightning/1.0b2pre Thunderbird/3.0 MIME-Version: 1.0 To: general@incubator.apache.org Subject: Re: [PROPOSAL] OODT: a grid middleware framework for science data processing, information integration, and retrieval References: <5c902b9e1001041146g8c778d5t53fe7ea6f719c0f5@mail.gmail.com> In-Reply-To: <5c902b9e1001041146g8c778d5t53fe7ea6f719c0f5@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - zencphosting06.zen.co.uk X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - apache.org Count me in as a mentor for this excellent proposal. Ross On 04/01/2010 19:46, Justin Erenkrantz wrote: > Hi general@, > > On behalf of the OODT community, I'd like to bring the following > proposal for discussion within the Incubator. I've talked with Chris > and Dave about bringing this project to Apache since May of 2007 (ICSE > in Minneapolis at the Ruth's Chris Steak house!) - so I'm very happy > to see this proposal arrive and be a mentor for it. > > Comments/suggestions/mentor volunteers welcome. > > Thanks! -- justin > > http://wiki.apache.org/incubator/OODTProposal > > --- Current Wiki Text below --- > > OODT, a grid middleware framework for science data processing, > information integration, and retrieval. > > Abstract > > OODT is a grid middleware framework used on a number of successful > projects at NASA's Jet Propulsion Laboratory/California Institute of > Technology, and many other research institutions and universities, > specifically those part of the: > > * National Cancer Institute's (NCI's) Early Detection Research > Network (EDRN) project - over 40+ institutions all performing research > into discovering biomarkers which are early indicators of disease. > * NASA's Planetary Data System (PDS) - NASA's planetary data > archive, a repository and registry for all planetary data collected > over the past 30+ years. > * various Earth Science data processing missions, including > Seawinds/QuickSCAT, the Orbiting Carbon Observatory, the NPP Sounder > PEATE project, and the Soil Moisture Active Passive (SMAP) mission. > > From the OODT website: > > It's middleware for metadata: > > * Transparent access to distributed resources > * Data discovery and query optimization > * Distributed processing and virtual archives > > It's a software architecture: > > * Models for information representation > * Solutions to knowledge capture problems > * Unification of technology, data, and metadata > > Proposal > > OODT is an established open source project, with 9+ years of > existence, and deployment at universities, federal research > institutions, other NASA centers, and the NIH (it won runner-up NASA > software of the year in 2003). It has a strong community of those that > operate and support its growth. Our proposal is to bring OODT into > Apache to strengthen its support and its capabilities even further on > the laurels of Apache's brand and its growing huge community of > developers from all over the world. In short, bringing OODT into > Apache will significantly enhance OODT's widespread use, will likely > improve its codebase, and furthermore will help Apache philosophy and > community spread into OODT's already large community-base reaching > across government, academia and industry. > > OODT will be, to the best of our knowledge, the first grid community > project to bear the Apache brand. By grid technology, we mean a > technology that provides the ability to create virtual organizations, > as originally described by Kesselman and Foster in their seminal paper > on grid computing. OODT provides both computational and data grid > support, and is built with a component-philosophy. OODT includes > components that allow for virtual information integration across > organizations (provided by the Profile, Product and Query server > components), and that allow for distributed data management and > processing across heterogeneous virtual organizations (provided by the > Catalog and Archive Service set of components, including File Manager, > Workflow Manager and Resource Manager). > > Each set of components exist as independently organized Maven2 > projects, that reference each other (where appropriate), forming a > layered set of components and a framework for grid computing. > > Background > > OODT is an established project within JPL and in use at several NASA > centers, as well as univerities, and other government organizations > and industrial collaborations. Chris Mattmann, a JPL employee, and ASF > PMC (Lucene) and Committer (Nutch, Tika), has been working for the > past 2 years on obtaining the necessary permission from JPL to release > OODT into Apache. After initially being stalled, JPL has granted > permission to allow OODT into Apache. > > Through his academic relationship with Justin Erekrantz, Apache > President, and through their collective Ph.D. studies, OODT has been > discussed between Chris and Justin on several occasions, and Justin > offered to help champion OODT into the Apache Incubator when JPL was > ready to release OODT. In December 2009, that permission was granted. > > This proposal is the result of the above efforts and related > discussions. Some alternatives to incubation, like Apache Labs came up > during the discussions but we believe that taking the project to the > Incubator is the best way to start growing a viable Apache-based > community to sustain OODT. Furthermore, given its larger code base and > existing sub-projects, the goal would be for OODT to leverage the > incubator to graduate into Apache's first top-level grid project, > rather than graduate into a sub-project of an existing TLP. > > Rationale > > Grid computing has been around for the past 10 years and has gained > widespread notoriety and attention in industry and academia. > Scientific collaborations are increasingly virtual and require the > capabilities (data and compute) of thousands of computers and > resources that span organizations. There are a number of existing grid > technologies (Globus being the most popular, DSpace, iRODS/Storage > Resource Broker, see this paper for a full study), however Apache has > no current grid technology under its umbrella and world reknown think > tank. Morever, efforts are few and far between in terms of standing up > Apache-based software that is applicable to the scientific community > and grid community outside of use of fine-grained components in these > systems. Other open source organizations (e.g., the Global > Organization for Earth System Science Portals, GO-ESSP) have embraced > the construction of such technology and there is a lot of work going > on, e.g., at NOAA. This proposal aims to remedy this fact and to bring > scientific data management/grid software into the Apache family and > its worldwide community. > > OODT is a widely successful grid project with applicability and > existing deployments across broad-reaching domains (planetary and > earth sciences, cancer research/biomedicine, climate modeling and > atmospheric science, etc.). The marriage of OODT and Apache will > engender OODT's widespread, global use via the Apache brand, and will > make Apache a player in the grid/scientific data community. > > Initial Goals > > The initial goals of the proposed project are: > > * Stand up a sustaining Apache-based community around the OODT codebase. > * Active relationships and possible cooperation with related > projects and communities. > * Refactor and bring up-to-date the OODT profile and product > server components. > * Explore various underlying communication substrates. OODT > currently uses REST (via its Web-Grid component). > * Create configuration-based OODT deployments. Currently the > deployments are primarily code-based, or the configuration is strewn > about the various sub-components. The goal would be to bring this > configuration under a single umbrella project. The idea would be to > create science data pipelines from configuration. > * Explore Python-based client and server implementations of OODT > and implementations in other languages (Ruby). > > Current Status > > Meritocracy > > Many of the proposed initial committers are familiar with the > meritocracy principles of Apache, and have already worked on the > various source codebases (contributing via patches, emails, JIRA > issues, and in Mattmann's case, as a Nutch, and Tika committer, and > Lucene PMC member). We will follow the normal meritocracy rules also > with other potential contributors. > > Community > > There is an existing, established community of developers and users of > OODT within over 40 centers at NASA, NIH, DOE and academia, however > there is no Apache OODT community as of yet. Our principal goal of > this effort is to leverage the Apache Incubator to grow an Apache > community base (in addition to OODT's existing community), and to > build a self-sustaining community around this shared vision, and > eventual Apache TLP status for OODT. With many sub projects (CAS, > Product/Profile servers, Query Server, Web-grid, commons, etc.), OODT > should attract a broad audience of developers with various interests. > > Core Developers > > The initial set of developers comes from NASA JPL, and with various > backgrounds, with different but compatible needs for the proposed > project. JPL is home to data management and grid projects spanning the > domains of cancer research/bioinformatics, earth science, planetary > science, astrophysics, and climate modeling. > > Alignment > > As Apache's first grid-based framework will likely be widely used by > various open source, scientific and commercial projects both together > with and independent of other Apache tools. With OODT's existing > community we will also bring developers and organizations outside of > Apache into the Apache ecosystem. > > Known Risks > > Orphaned products > > OODT has supported itself through successful deployments at NASA, at > the U.S. National Institutes of Health (NIH), and recently at > DOE-based laboratories and at academic centers. Further, OODT has been > an active participant in IEEE/ACM-based conferences and > meetings/journal publications over the past 9 years. There is active > support on several existing NASA earth science missions, and the team > at JPL is experienced and will continue to champion and develop OODT > in the Apache area. > > Our goal is to take OODT from the early stage of Apache Incubation > into a thriving Apache top-level project, and leverage it in the > existing manner at NASA, the NIH, at DOE, and in academia and > industry. Since OODT is a grid framework, it depends on many external > services and projects, no one of which controls OODT's code-base. > > We feel that the time is ripe to bring OODT into Apache and to grow > the community of developers who maintain OODT. We feel that Incubation > will bring a slew of industry-based developers (and even those in > academia, and government) who have no prior experience with OODT, but > who could use OODT at their jobs and who are attracted to the brand > name and community that Apache brings. We want to attract such > developers to become part of the core OODT development team, and > project management aspect. > > Inexperience with Open Source > > All the initial developers have worked on open source before and at > least one (Mattmann) is a committer and PMC members in the Apache > Lucene ecosystem. Sean Kelly is a well-respected Plone committer and > has made several open source contributions over the years to FreeBSD > and other software. Foster, McCleese and Woollard have all contributed > to Apache projects by way of email, mailing lists, issue reporting and > testing. > > Homogenous Developers > > The initial developers come from a variety of backgrounds and with a > variety of needs for the proposed framework. > > Reliance on Salaried Developers > > All of the proposed initial developers are paid to work on this or > related projects, but the proposed project is not the primary task for > anyone. > > Relationships with Other Apache Products > > OODT is related to at least the following Apache projects. None of the > projects is a direct competitor for OODT, but there are many cases of > potential overlap in functionality. > > * Apache Lucene - The family of Lucene products that implement > search services are naturally of use in a grid environment such as > OODT. In fact, OODT has integrated with many of these projects (Tika, > SOLR and Lucene-java) already. We see OODT as a grid environment that > makes use of search services. > * Apache UIMA - The UIMA project provides a framework and > pluggable tools for analyzing text content and extracting information. > Example tools include language identification, sentence boundary > detection and "entity extraction" - finding references to people, > places and organizations. OODT is related to UIMA in the sense that it > is a framework to provide pluggable connections to content and > information, but the focus of OODT is on scientific data sets, and > additional on repositories and catalogs/registries that catalog > information about those datasets and that store the physical bits. > Further, OODT is a grid technology, meant to enable the creation of > virtual organizations, which is not UIMA's focus.Finally, OODT > contains both an information integration component, as well as a > science data processing component, which UIMA does not. > > OODT is also related to Apache projects involving databases, such as > the Apache DB project, however scientific data is not limited to > traditional DBMS'es and involves both structured and un-structured > information. However, there is likely much leveraging that can occur > as OODT can be updated to remove Hibernate-like dependencies, and > replace them with Derby-like dependencies. > > A Excessive Fascination with the Apache Brand > > All of us are familiar with Apache and have a respect for its brand > and community. Though all of the proposed committers besides Mattmann > have not participated in Apache projects as committers, and PMC > members, many of them (McCleese, Foster, Woollard, Kelly) have > contributed via issue comments, patches, and tests for Apache projects > (including Maven, Tika, SOLR, and Lucene). Furthermore, some of the > proposed committers (Kelly) are major contributors in other open > source communities (e.g., Plone and Python). We feel that the Apache > Software Foundation is a natural home for a project like this. OODT > brings a credible, major grid-based software into the Apache > community, and Apache brings a huge community of eager and world-class > developers to help grow OODT's strengths and applicability across > projects and domains. > > Documentation > > There is a wealth of documentation available on OODT. The best > starting point is the existing OODT JPL website (which will be ported > to be sync'ed or just a pointer to the Apache > website)http://oodt.jpl.nasa.gov > > * OODT website at JPL > * Mattmann's OODT paper that appeared at the 28th International > Conference on Software Engineering in Shanghai, China. > * Crichton's seminal OODT paper appearing at the CODATA conference > at the U.S. National Academies of Science in 2000. > * Google Scholar search on OODT. > > Standards and conventions related to OODT include the Dublin Core > metadata set, ISO/IEC 11179, the HTTP 1.1 RFC, Grid-based standards > including the Open Grid Services Architecture (OGSA), and standards > for science data formats including Heirarchical Data Format (HDF), > netCDF and OPeNDAP. > > Initial Source > > OODT will start with seed code donated by NASA JPL via Mattmann and > the rest of the initial committers. > > Source and Intellectual Property Submission Plan > > All seed code and other contributions will be handled through the > normal Apache contribution process. Mattmann has been authorized by > NASA JPL to lead the contribution of OODT into the Incubator via his > existing Apache CLA. > > We will also contact other related efforts for possible cooperation > and contributions. > > External Dependencies > > OODT depends on a number of external connector libraries with various > licensing conditions. An initial list of such dependencies (taken from > one of the OODT sub-components, the CAS file manager) is shown below. > > Library | License > commons-codec | AL v2 > commons-dbcp | AL v2 > commons-httpclient | AL v2 > commons-io | AL v2 > commons-pool | AL v2 > cas-metadata | (to be AL v2) > edm-commons | (to be AL v2) > hsqldb | LGPL v2.1 > jug-asl | AL v2 > lucene-core | AL v2 > xmlrpc | AL v2 > > There are also some LGPL components that would be useful. Whether and > how such dependencies could be handled will be discussed during > incubation. No such dependencies will be added to the project before > the legal implications have been cleared. Existing LGPL dependencies, > such as hsqldb above for the CAS file manager, will be removed and a > suitable ASL friendly alternative will be investigated and used to > replace the LGPL dependencies. > > Cryptography > > OODT itself will not use cryptography, but it is possible that some of > the external product or profile server or CAS libraries will include > cryptographic code to handle features present in various science data > formats. The current OODT code base relies on Apache Tika which > contains an export control statement regarding cryptographic code per > Apache policy. We will follow a similar approach with OODT. Mattmann > led this effort in Apache Nutch and saw Jukka Zitting lead this effort > in Apache Tika, so he is familiar with this process. > > Required Resources > > Mailing lists > > * oodt-dev@incubator.apache.org > * oodt-commits@incubator.apache.org > * oodt-private@incubator.apache.org > > Subversion Directory > > * https://svn.apache.org/repos/asf/incubator/oodt > > Issue Tracking > > * JIRA OODT (OODT) > > Other Resources > > * OODT Wiki http://cwiki.apache.org/OODT > > Initial Committers > > Name | Email | Affiliation | CLA > > Chris A. Mattmann | mattmann at apache dot org | NASA Jet Propulsion > Laboratory | yes > Daniel J. Crichton | crichton at jpl dot nasa dot gov | NASA Jet > Propulsion Laboratory | no > Paul Ramirez | pramirez at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Sean Kelly | kelly at jpl dot nasa dot gov | NASA Jet Propulsion Laboratory | no > Sean Hardman | shardman at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Andrew F. Hart | ahart at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Joshua Garcia | joshua at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > David Woollard | woollard at jpl dot nasa dot gov | NASA Jet > Propulsion Laboratory | no > Brian Foster | bfoster at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > Sean McCleese | smcclees at jpl dot nasa dot gov | NASA Jet Propulsion > Laboratory | no > > Sponsors > > Champion > > * Justin Erenkrantz (jerenkrantz at apache dot org) > > Nominated Mentors > > * Justin Erenkrantz (jerenkrantz at apache dot org) > > Sponsoring Entity > > * Apache Incubator > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org