incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "OODTProposal" by chrismattmann
Date Thu, 31 Dec 2009 18:45:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "OODTProposal" page has been changed by chrismattmann.
http://wiki.apache.org/incubator/OODTProposal?action=diff&rev1=5&rev2=6

--------------------------------------------------

  
  == Known Risks ==
  === Orphaned products ===
- There are a number of projects at various stages of maturity that implement a subset of
the proposed features in Tika. For many potential users the existing tools are already enough,
which reduces the demand for a more generic toolkit. This can also be seen in the slow progress
of this proposal over the past year.
+ OODT has supported its self through successful deployments at NASA, at the U.S. National
Institutes of Health (NIH), and recently at DOE-based laboratories and at academic centers.
Further, OODT has been an active participant in IEEE/ACM-based conferences and meetings/journal
publications over the past 9 years. There is active support on several existing NASA earth
science missions, and the team at JPL is experienced and will continue to champion and develop
OODT in the Apache area.
  
- However, once the project gets started we can quickly reach the feature level of existing
tools based on seed code from sources mentioned below. After that we believe to be able to
quickly grow the developer and user communities based on the benefits of a generic toolkit
over custom alternatives.
+ Our goal is to take OODT from the early stage of Incubation into a thriving Apache top-level
project, and leverage it in the existing manner at NASA, the NIH, at DOE, and in academia
and industry. Since OODT is a grid framework, it depends on many external services and projects,
no one of which controls OODT's code-base.
+ 
+ We feel that the time is ripe to bring OODT into Apache and
  
  === Inexperience with Open Source ===
- All the initial developers have worked on open source before and many are committers and
PMC members within other Apache projects.
+ All the initial developers have worked on open source before and at least one (Mattmann)
is a committer and PMC members in the Apache Lucene ecosystem. Sean Kelly is a well-respected
Plone committer and has made several open source contributions over the years to FreeBSD and
other software. Foster, McCleese and Woollard have all contributed to Apache projects by way
of email, mailing lists, issue reporting and testing.
  
  === Homogenous Developers ===
- The initial developers come from a variety of backgrounds and with a variety of needs for
the proposed toolkit.
+ The initial developers come from a variety of backgrounds and with a variety of needs for
the proposed framework.
  
  === Reliance on Salaried Developers ===
- Some of the developers are paid to work on this or related projects, but the proposed project
is not the primary task for anyone.
+ All of the proposed initial developers are paid to work on this or related projects, but
the proposed project is not the primary task for anyone.
  
  === Relationships with Other Apache Products ===
- Tika is related to at least the following Apache projects. None of the projects is a direct
competitor for Tika, but there are many cases of potential overlap in functionality.
+ OODT is related to at least the following Apache projects. None of the projects is a direct
competitor for OODT, but there are many cases of potential overlap in functionality.
  
-  * [[http://lucene.apache.org/java/|Apache Lucene]] - The analysis part of Lucene contains
code that might overlap with some of the potential Tika functionality. There might also be
some overlap regarding the Document model in Lucene.
-  * [[http://lucene.apache.org/nutch/|Lucene Nutch]] - The Nutch project already contains
a parser framework that does many of the things that Tika is designed to do.
-  * [[http://jackrabbit.apache.org/|Apache Jackrabbit]] - The Jackrabbit project contains
a text extraction component that also implements a subset of the proposed Tika features.
-  * [[http://incubator.apache.org/uima/|Apache UIMA]] - The UIMA project provides a framework
and pluggable tools for analyzing text content and extracting information. Example tools include
language identification, sentence boundary detection and "entity extraction" - finding references
to people, places and organisations. Tika could be used by UIMA to parse text but Tika should
be careful not to duplicate the subsequent text analysis features UIMA offers.
+  * [[http://lucene.apache.org/java/|Apache Lucene]] - The family of Lucene products that
implement search services are naturally of use in a grid environment such as OODT. In fact,
OODT has integrated with many of these projects (Tika, SOLR and Lucene-java) already. We see
OODT as a grid environment that makes use of search services.
+  * [[http://incubator.apache.org/uima/|Apache UIMA]] - The UIMA project provides a framework
and pluggable tools for analyzing text content and extracting information. Example tools include
language identification, sentence boundary detection and "entity extraction" - finding references
to people, places and organizations. OODT is related to UIMA in the sense that it is a framework
to provide pluggable connections to content and information, but the focus of OODT is on scientific
data sets, and additional on repositories and catalogs/registries that catalog information
about those datasets and that store the physical bits. Further, OODT is a grid technology,
meant to enable the creation of virtual organizations, which is not UIMA's focus.Finally,
OODT contains both an information integration component, as well as a science data processing
component, which UIMA does not.
+ 
+ OODT is also related to Apache projects involving databases, such as the [[http://db.apache.org/|Apache
DB]] project, however scientific data is not limited to traditional DBMS'es and involves both
structured and un-structured information. However, there is likely much leveraging that can
occur as OODT can be updated to remove Hibernate-like dependencies, and replace them with
[[http://db.apache.org/derby/|Derby]]-like dependencies.
  
  === A Excessive Fascination with the Apache Brand ===
- All of us are familiar with Apache and we have participated in Apache projects as contributors,
committers, and PMC members. We feel that the Apache Software Foundation is a natural home
for a project like this.
+ All of us are familiar with Apache and have a respect for its brand and community. Though
all of the proposed committers besides Mattmann have not participated in Apache projects as
committers, and PMC members, many of them (McCleese, Foster, Woollard, Kelly) have contributed
via issue comments, patches, and tests for Apache projects (including Maven, Tika, SOLR, and
Lucene). Furthermore, some of the proposed committers (Kelly) are major contributors in other
open source communities (e.g., [[http://plone.org|Plone]] and Python). We feel that the Apache
Software Foundation is a natural home for a project like this. OODT brings a credible, major
grid-based software into the Apache community, and Apache brings a huge community of eager
and world-class developers to help grow OODT's strengths and applicability across projects
and domains.
  
  == Documentation ==
- There is a wealth of documentation available on OODT. The best starting point is the existing
OODT JPL website (which will be ported to be sync'ed or just a pointer to the Apache website)[[http://oodt.jpl.nasa.gov]]
+ There is a wealth of documentation available on OODT. The best starting point is the existing
OODT JPL website (which will be ported to be sync'ed or just a pointer to the Apache website)http://oodt.jpl.nasa.gov
  
   * [[http://oodt.jpl.nasa.gov|OODT website at JPL]]
-  * Mattmann's OODT paper that appeared at the 28th International Conference on Software
Engineering in Shanghai, China.
-  * Crichton's seminal OODT paper appearing at the CODATA conference.
-  * Google Scholar search on OODT
+  * Mattmann's [[http://csse.usc.edu/~mattmann/pubs/ICSE06.pdf|OODT paper]] that appeared
at the [[http://www.isr.uci.edu/icse-06/|28th International Conference on Software Engineering]]
in Shanghai, China.
+  * Crichton's [[http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/14067/1/00-0464.pdf|seminal
OODT paper]] appearing at the CODATA conference at the U.S. National Academies of Science
in 2000.
+  * [[http://scholar.google.com/scholar?hl=en&q=OODT&btnG=Search&as_sdt=2000&as_ylo=&as_vis=0|Google
Scholar search on OODT]].
  
- Standards and conventions related to OODT include the [[http://dublincore.org/|Dublin Core]]
metadata set, [[http://www.iso.org/iso/catalogue_detail.htm?csnumber=1758|ISO/IEC 11179]],
the [[http://www.w3.org/Protocols/rfc2616/rfc2616.html|HTTP 1.1 RFC]], Grid-based standards
including the [[http://www.globus.org/alliance/publications/papers/ogsa.pdf|Open Grid Services
Architecture (OGSA)]] adnasd ads
+ Standards and conventions related to OODT include the [[http://dublincore.org/|Dublin Core]]
metadata set, [[http://www.iso.org/iso/catalogue_detail.htm?csnumber=1758|ISO/IEC 11179]],
the [[http://www.w3.org/Protocols/rfc2616/rfc2616.html|HTTP 1.1 RFC]], Grid-based standards
including the [[http://www.globus.org/alliance/publications/papers/ogsa.pdf|Open Grid Services
Architecture (OGSA)]], and standards for science data formats including [[http://www.hdfgroup.org/|Heirarchical
Data Format (HDF)]], [[http://www.unidata.ucar.edu/software/netcdf/|netCDF]] and [[http://opendap.org|OPeNDAP]].
  
  == Initial Source ==
  OODT will start with seed code donated by NASA JPL via Mattmann and the rest of the initial
committers.
@@ -109, +111 @@

  
  == External Dependencies ==
  OODT depends on will depend on a number of external connector libraries with various licensing
conditions. An initial list of such dependencies (taken from one of the OODT sub-components,
the CAS file manager) is shown below.
+ ||<tableclass="bodyTable"rowclass="b">'''Library''' ||'''License''' ||
+ ||<rowclass="b">commons-codec ||ASL v2 ||
+ ||<rowclass="a">commons-dbcp ||ASL v2 ||
+ ||<rowclass="b">commons-httpclient ||ASL v2 ||
+ ||<rowclass="a">commons-io ||ASL v2 ||
+ ||<rowclass="b">commons-pool ||ASL v2 ||
+ ||<rowclass="a">cas-metadata ||(to be ASL v2) ||
+ ||<rowclass="b">edm-commons ||(to be ASL v2) ||
+ ||<rowclass="a">hsqldb ||LGPL v2.1 ||
+ ||<rowclass="b">jug-asl ||ASL v2 ||
+ ||<rowclass="a">lucene-core ||ASL v2 ||
+ ||<rowclass="b">xmlrpc ||ASL v2 ||
  
+ 
- ||<tableclass="bodyTable"rowclass="b">'''Library'''||'''License'''||
- ||<rowclass="b">commons-codec||ASL v2||
- ||<rowclass="a">commons-dbcp||ASL v2||
- ||<rowclass="b">commons-httpclient||ASL v2||
- ||<rowclass="a">commons-io||ASL v2||
- ||<rowclass="b">commons-pool||ASL v2||
- ||<rowclass="a">cas-metadata||(to be ASL v2)||
- ||<rowclass="b">edm-commons||(to be ASL v2)||
- ||<rowclass="a">hsqldb||LGPL v2.1||
- ||<rowclass="b">jug-asl||ASL v2||
- ||<rowclass="a">lucene-core||ASL v2||
- ||<rowclass="b">xmlrpc||ASL v2||
  
  
  There are also some LGPL parser libraries that would be useful. Whether and how such dependencies
could be handled will be discussed during incubation. No such dependencies will be added to
the project before the legal implications have been cleared.Existing LGPL dependencies, such
as hsqldb above for the CAS file manager, will be removed and a suitable ASL friendly alternative
will be investigated and used to replace the LGPL dependencies.
  
  == Cryptography ==
- OODT itself will not use cryptography, but it is possible that some of the external product
or profile server or CAS libraries will include cryptographic code to handle features like
DRM in various science data formats. The current OODT code base relies on [[http://lucene.apache.org/tika/|Apache
Tika]] which contains an export control statement regarding cryptographic code per Apache
policy. We will follow a similar approach with OODT. Mattmann lead this effort in [[http://lucene.apache.org/nutch/|Apache
Nutch]] and saw Jukka Zitting lead this effort in Apache Tika, so he is familiar with this
process.
+ OODT itself will not use cryptography, but it is possible that some of the external product
or profile server or CAS libraries will include cryptographic code to handle features present
in various science data formats. The current OODT code base relies on [[http://lucene.apache.org/tika/|Apache
Tika]] which contains an export control statement regarding cryptographic code per Apache
policy. We will follow a similar approach with OODT. Mattmann led this effort in [[http://lucene.apache.org/nutch/|Apache
Nutch]] and saw Jukka Zitting lead this effort in Apache Tika, so he is familiar with this
process.
  
  == Required Resources ==
  Mailing lists
@@ -149, +152 @@

   * OODT Wiki http://cwiki.apache.org/OODT
  
  == Initial Committers ==
- ||'''Name''' ||'''Email''' || ||'''Affiliation'''||'''CLA''' ||
+ ||'''Name''' ||'''Email''' ||'''Affiliation''' ||'''CLA''' ||
- ||Chris A. Mattmann ||mattmann at apache dot org || ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||yes ||
+ ||Chris A. Mattmann ||mattmann at apache dot org ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]] ||yes ||
- ||Daniel J. Crichton ||crichton at jpl dot nasa dot gov || ||[[http://www.jpl.nasa.gov/|NASA
Jet Propulsion Laboratory]]||no||
+ ||Daniel J. Crichton ||crichton at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA
Jet Propulsion Laboratory]] ||no ||
- ||Paul Ramirez ||pramirez at jpl dot nasa dot gov || ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no ||
+ ||Paul Ramirez ||pramirez at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]] ||no ||
- ||Sean Kelly ||kelly at jpl dot nasa dot gov || ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]]||no ||
+ ||Sean Kelly ||kelly at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]] ||no ||
- ||Sean Hardman ||shardman at jpl dot nasa dot gov || ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no ||
+ ||Sean Hardman ||shardman at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]] ||no ||
- ||Andrew F. Hart||ahart at jpl dot nasa dot gov|| ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no||
+ ||Andrew F. Hart ||ahart at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]] ||no ||
- ||Joshua Garcia||joshua at jpl dot nasa dot gov|| ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]] ||no||
+ ||Joshua Garcia ||joshua at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]] ||no ||
- ||David Woollard||woollard at jpl dot nasa dot gov|| ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no||
+ ||David Woollard ||woollard at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]] ||no ||
- ||Brian Foster||bfoster at jpl dot nasa dot gov|| ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no||
+ ||Brian Foster ||bfoster at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet Propulsion
Laboratory]] ||no ||
- ||Sean McCleese||smcclees at jpl dot nasa dot gov|| ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]]||no||
+ ||Sean McCleese ||smcclees at jpl dot nasa dot gov ||[[http://www.jpl.nasa.gov/|NASA Jet
Propulsion Laboratory]] ||no ||
- 
- 
  
  
  == Sponsors ==

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message