incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: [VOTE] Accept Science Data Analytics Platform (SDAP) into Apache Incubator WAS Re: [DISCUSS] Accept Science Data Analytics Platform (SDAP) into Apache Incubator
Date Tue, 17 Oct 2017 21:21:47 GMT
+1 binding from me…thanks and good luck

Cheers,
Chris





On 10/17/17, 2:04 PM, "lewis john mcgibbney" <lewismc@apache.org> wrote:

    Hi Folks,
    Having secured a mentorship team consisting of the following IPMC Members,
    I am happy to open a formal VOTE thread on accepting the Science Data
    Analytics Platform (SDAP) into Apache Incubator.
    
       - Lewis John McGibbney (lewismc@apache.org)
       - Raphael Bircher (bircher at apace dot org)
       - Suneel Marthi (smarthi at apache dot org)
    
    Thank you to both Raphael and Suneel for coming forward. :)
    The VOTE will be open for at least 72 hours.
    
    [ ] +1 Accept Science Data Analytics Platform (SDAP) into Apache Incubator
    [ ] +/-0 ... just because
    [ ] -1 Do NOT Accept Science Data Analytics Platform (SDAP) into Apache
    Incubator... because
    
    Thanks in advance to all participants.
    Lewis
    
    P.S. Here is a binding +1 from me
    
    On Wed, Oct 11, 2017 at 11:22 AM, lewis john mcgibbney <lewismc@apache.org>
    wrote:
    
    > Hi Folks,
    > I would like to open a DISCUSS thread on the topic of accepting the
    > Science Data Analytics Platform (SDAP) <https://wiki.apache.org/
    > incubator/SDAPProposal> Project into the Incubator.
    > I am CC'ing Thomas Huang from NASA JPL who I have been working with to
    > build community around a kick-ass set of software projects under the SDAP
    > umbrella.
    > At this stage we would very much appreciate critical feedback from general@
    > community. We are also open to mentors who may have an interest in the
    > project proposal.
    > The proposal is pasted below.
    > Thanks in advance,
    > Lewis
    >
    > = Abstract =
    > The Science Data Analytics Platform (SDAP) establishes an integrated data
    > analytic center for Big Science problems. It focuses on technology
    > integration, advancement and maturity.
    >
    > = Proposal =
    > SDAP currently represents a collaboration between NASA Jet Propulsion
    > Laboratory (JPL), Florida State University (FSU), the National Center for
    > Atmospheric Research (NCAR), and George Mason University (GMU). SDAP brings
    > together a number of big data technologies including a NASA funded
    > OceanXtremes (Anomaly detection and ocean science), NEXUS (Deep data
    > analytic platform), DOMS (Distributed in-situ to satellite matchup), MUDROD
    > (Search relevancy and discovery) and VQSS (Virtualized Quality Screening
    > Service) under a single umbrella. Within the original Incubator proposal,
    > VQSS will not be included however it is anticipated that a future source
    > code donation will cover VQSS.
    >
    > = Background and Rationale =
    > SDAP is a technology software solution currently geared to better enable
    > scientists involved in advancing the study of the Earth's physical
    > oceanography. With increasing global temperature, warming of the ocean, and
    > melting ice sheets and glaciers, the impacts can be observed from changes
    > in anomalous ocean temperature and circulation patterns, to increasing
    > extreme weather events and stronger/more frequent hurricanes, sea level
    > rise and storm surges affecting coastlines, and may involve drastic changes
    > and shifts in marine ecosystems. Ocean science communities are relying on
    > data distributed through data centers such as the JPL's Physical
    > Oceanographic Data Active Archive Center (PO.DAAC) to conduct their
    > research. In typical investigations, oceanographers follow a traditional
    > workflow for using datasets: search, evaluate, download, and apply tools
    > and algorithms to look for trends. While this workflow has been working
    > very well historically for the oceanographic community, it cannot scale if
    > the research involves massive amount of data. NASA's Surface Water and
    > Ocean Topography (SWOT) mission, scheduled to launch in April of 2021, is
    > expected to generate over 20PB data for a nominal 3-year mission. This will
    > challenge all existing NASA Earth Science data archival/distribution
    > paradigms. It will no longer be feasible for Earth scientists to download
    > and analyze such volumes of data. SDAP was therefore developed primarily as
    > a Web-service platform for big ocean data science at the PO.DAAC with open
    > source solutions used to enable fast analysis of oceanographic data. SDAP
    > has been developed collaboratively between JPL, FSU, NCAR, and GMU and is
    > rapidly maturing to become the generic platform for the next generation of
    > big science data solutions. The platform is an orchestration of several
    > previously funded NASA big ocean data solutions using cloud technology,
    > which include data analysis (NEXUS), anomaly detection (OceanXtremes),
    > matchup (DOMS), subsetting, discovery (MUDROD), and visualization (VQSS).
    > SDAP will enable web-accessible, fast data analysis directly on huge
    > scientific data archives to minimize data movement and provide access,
    > including subset, only to the relevant data.
    >
    > = Science Data Analytics Platform Project Overview =
    > SDAP consists of several loosely coupled, independently functioning
    > sub-projects. The graphic below displays an overview of how these
    > sub-projects fuse together. N.B., although the graphic uses terminology
    > relating to OceanWorks, essentially the SDAP architecture is identical.
    >
    > {{attachment:sdap.png}}
    >
    > == OceanXtremes ==
    > Oceanographic Data-Intensive Anomaly Detection and Analysis Portal. An
    > application that allows you to view imagery and perform analysis on sea
    > level rise data.
    >
    > '''Objective'''
    > Develop an anomaly detection system which identifies items, events or
    > observations which do not conform to an expected pattern.
    >  * Mature and test domain-specific, multi-scale anomaly and feature
    > detection algorithms.
    >  * Identify unexpected correlations between key measured variables.
    >
    > Demonstrate value of technologies in this service:
    >  * Adapted Map-Reduce data mining.
    >  * Algorithm profiling service.
    >  * Shared discovery and exploration search tools.
    >  * Automatic notification of events of interest.
    >
    > == NEXUS ==
    > NEXUS is an emerging technology developed at JPL
    >  * A Cloud-based/Cluster-based data platform that performs scalable
    > handling of observational parameters analysis designed to scale horizontally
    >  * Leveraging high-performance indexed, temporal, and geospatial search
    > solution
    >  * Breaks data products into small chunks and stores them in a Cloud-based
    > data store
    >
    > ''Data Volumes Exploding''
    >  * SWOT mission is coming
    >  * File I/O is slow
    >
    > ''Scalable Store & Compute is Available''
    >  * NoSQL cluster databases
    >  * Parallel compute, in-memory map-reduce
    >  * Bring Compute to Highly-Accessible Data (using Hybrid Cloud)
    >
    > ''Pre-Chunk and Summarize Key Variables''
    >  * Easy statistics instantly (milliseconds)
    >  * Harder statistics on-demand (in seconds)
    >  * Visualize original data (layers) on a map quickly
    >
    > == DOMS ==
    > The Distributed Oceanographic Match-Up Service
    > DOMS is designed to reconcile satellite and in situ datasets in support of
    > NASA's Earth Science mission. The service will provide a mechanism for
    > users to input a series of geospatial references for satellite observations
    > and receive the in situ observations that are matched to the satellite data
    > within a selectable temporal and spatial domain. DOMS includes several
    > characteristic in situ and satellite observation datasets - with an initial
    > focus on salinity, sea temperature, and winds. DOMS will be used by the
    > marine and satellite research communities to support a range of activities
    > and several use cases will be described. The service is designed to provide
    > a community-accessible tool that dynamically delivers matched data and
    > allows the scientist to only work with the subset of data where the matches
    > exist.
    >
    > == MUDROD ==
    > Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to
    > Improve Data Discovery and Access
    > Data discovery accuracy is a challenging topic for both Earth science and
    > other domains. It is especially true for scientific data sets that are not
    > as popular as Amazon or Google data. MUDROD is focused on mining oceanic
    > knowledge from the PO.DAAC user log files to improve the end user data
    > discovery experience at PO.DAAC. There are three steps in the research: a)
    > the oceanographic semantics were extracted from three resources of SWEET,
    > GCMD ontology, and the keywords used by end users for searching PO.DAAC
    > datasets, b) mining the linkage among different vocabularies based on user
    > data discvoery sessions, and c) build the linkage among vocabularies based
    > on a comprehensive approach by considering domain de facto standard, e.g.,
    > SWEET and GCMD, and the knowledge mined from the log files. The semantics
    > is used to improve data discovery for ranking results, navigating among
    > vocabularies, and recommending data based on user searchers.
    >
    > = Current Status =
    > All components of SDAP were originally designed and developed under grants
    > from the NASA-funded Advanced Information Systems and Technologies (AIST)
    > program. The initiative to bring them the components together under the
    > SDAP umbrella was granted through an AIST-funded follow-on grant which will
    > run for another ~18 or so months.
    > Currently no projects have made official releases so outside of community
    > building, this will be our primary Incubating goal. All SDAP source code is
    > currently publicly available and licensed under the ALv2.0.
    >
    > = Meritocracy =
    > The current developers are familiar with meritocratic open source
    > development at Apache. The SDAP team consumes Apache products heavily with
    > members being part of several Apache user communities. SDAP itself has
    > critical dependencies upon Apache products. Lewis McGibbney (JPL employee),
    > a Member of the ASF and V.P. of Apache Any23, Gora PMC Nutch, Tika, OODT,
    > OCW, etc., is championing the effort to bring SDAP into and through the
    > Apache Incubator and has been evangelizing the Apache Way to the current
    > SDAP contributors such that the meritocratic process is well understood and
    > followed. Apache was chosen specifically because we want to encourage this
    > style of community development for the project and for it to sustain SDAP
    > forward to become the generic platform for the next generation of big
    > science data solutions
    >
    > = Community =
    > The SDAP project is a fairly new effort and our community is not yet
    > fully/firmly established. Initial committers comprising the SDAP roster
    > have only recently fully come together as a unified team however there is a
    > large degree of synergy between constituent members at JPL, FSU, NCAR, and
    > GMU. Therefore, community building and publicity continues to be a major
    > thrust. With the activity and exposure regularly attained by several
    > community members, we hope to grow the SDAP presence in and across several
    > (scientific) forums. The SDAP technology is generating interest within
    > communities such as the Earth Science Information Partnership (ESIP),
    > American Geophysical Union (AGU) and plethora or science meetings around
    > the globe. This in effect, we hope, will further contribute towards the
    > possibility of SDAP being used across Government Agencies such as NASA,
    > NOAA, USGS, EPA, DOI, etc. as well as by researchers and students in
    > academic institutions around the globe.
    > During incubation, we will explicitly seek to increase our adoption, with
    > SDAP already being featured on the agenda for several high profile globally
    > significant scientific conferences and meetings.
    >
    > = Core Developers =
    > The current set of core developers is relatively small, including
    > full-time and students from across JPL, FSU, NCAR, and GMU. Initial
    > community management and participation will be distributed across the
    > entire team, most of which have been involved with the constituent projects
    > for <2 years.
    >
    > = Alignment =
    > All SDAP code is licensed under Apache v2.0.
    >
    > = Known Risks =
    >
    > == Orphaned products ==
    > There are currently no orphaned products. Each component of SDAP has
    > dedicated personnel leading and participating in its ongoing development.
    > Additionally, there is substantial collaboration between projects
    > facilitated by regular project meetings which are specific the the initial
    > member entities and focused on advancing physical oceanographic science.
    >
    > == Inexperience with Open Source ==
    > JPL (in particular Lewis McGibbney) has been part of several efforts to
    > transition to and grow projects communities at Apache e.g. Apache OODT,
    > Apache Open Climate Workbench, Apache Joshua (Incubating), Apache SensSoft
    > (Incubating), Apache DRAT (Incubating). Most of the code developed under
    > the SDAP umbrella was and is open source prior to the Incubator effort so
    > we are well familiarized with the nuances of open source software.
    >
    > = Relationships with Other Apache Products =
    > SDAP has strong dependency upon a number of high profile and smaller
    > profile Apache products. Examples can be seen in the breakdown of External
    > Dependencies. As we continue to grow SDAP within the Incubator, we will
    > make efforts to share community stories, software advancements and possible
    > improvements in our use of our Apache dependencies back to those project
    > communities.
    >
    > = Developers =
    > The SDAP project and hence developers is currently funded through a NASA
    > AIST follow-on grant with funding secured for the next ~18 months. There
    > are currently no 100% time dedicated developers, however, the same core
    > team that does work currently will continue to work on the project
    > throughout the next current funding period and after. There is currently no
    > business strategy aligned with SDAP however it is perceived that future,
    > yet unsecured funding may by directed to further feature advancement and
    > project evangelism.
    >
    > = Documentation =
    > Documentation is currently available in a number of locations e.g. Github
    > wiki, Github pages, etc. with each repository under the oceanworks-aist
    > Github Org maintaining documentation available through wiki’s attached to
    > the repositories. Additionally, most of the SDAP sub-projects have been
    > extensively documented within plethora of formal academic publications
    > across several academic communities. It would be our intention, certainly
    > atleast to unify the Github wiki ad Github pages documentation most likely
    > to make up the sdap.apache.org Website content.
    >
    > = Initial Source =
    > Current source resides in several locations Github:
    >  * https://github.com/dataplumber/nexus (NEXUS, OceanXtremes, DOMS)
    >  * https://github.com/dataplumber/edge (EDGE)
    >  * https://github.com/aist-oceanworks/mudrod (MUDROD)
    >  * https://bitbucket.org/coaps_mdc/doms/src (DOMS)
    >
    > = External Dependencies =
    > Each component of the Science Data Analytics Platform has its own
    > dependencies. Documentation will be available for integrating them.
    >
    > == MUDROD ==
    > '''Core'''
    > com.google.code.gson gson 2.5 compile
    > jar false
    > org.jdom jdom 2.0.2 compile
    > jar false
    > org.elasticsearch elasticsearch 5.2.0 compile
    > jar false
    > org.elasticsearch elasticsearch-spark-20_2.11 5.2.0 compile
    > jar false
    > joda-time joda-time 2.9.4 compile
    > jar false
    > com.carrotsearch hppc 0.7.1 compile
    > jar false
    > org.apache.spark spark-core_2.11 2.1.0 compile
    > jar false
    > org.apache.spark spark-sql_2.11 2.1.0 compile
    > jar false
    > org.apache.spark spark-mllib_2.11 2.1.0 compile
    > jar false
    > org.scala-lang scala-library 2.11.8 compile
    > jar false
    > org.codehaus.jettison jettison 1.3.8 compile
    > jar false
    > commons-cli commons-cli 1.2 compile
    > jar false
    > net.sf.opencsv opencsv 2.3 compile
    > jar false
    > org.apache.jena jena-core 3.3.0 compile
    > jar false
    > junit junit 4.12 test
    > jar false
    >
    > '''Service'''
    > gov.nasa.jpl.mudrod mudrod-core 0.0.1-SNAPSHOT compile
    > jar false
    > javax.servlet javax.servlet-api 3.1.0 provided
    > jar false
    > com.google.code.gson gson 2.5 compile
    > jar false
    >
    > '''Web'''
    >  * AngularJS - MIT License
    >  * BootstrapJS - MIT License
    >  * jQueryJS - MIT License
    >  * Underscore JS - MIT License
    >
    > == DOMS ==
    >  * Apache Solr version 5.5.1http://lucene.apache.org/solr/
    >  * EDGE https://github.com/dataplumber/edge
    >  * NetCDF4 http://unidata.github.io/netcdf4-python/
    >  * Python 3.5 (NOTE: only partial support for py2.7)
    >
    > Non stdlib Python dependencies:
    >  * Jinja2==2.9.5
    >  * python-dateutil==2.6.0
    >  * cython==0.25.2
    >  * numpy==1.12.0
    >  * scipy==0.18.1
    >  * netCDF4==1.2.7
    >  * solrpy3
    >  * siphon==0.4.0
    >  * neo4j-driver==1.1.0
    >  * matplotlib==2.0.0
    >  * requests==2.13.0
    >  * shapely==1.5.17
    >  * flask==0.12
    >  * networkx==1.11
    >  * pyproj==1.9.5.1
    >  * blist==1.3.6
    >
    > == NEXUS ==
    > '''Analysis'''
    >  * https://github.com/dataplumber/nexus/blob/master/
    > analysis/package-list.txt
    >  * https://github.com/dataplumber/nexus/blob/master/
    > analysis/requirements.txt
    >
    > '''Client'''
    >  * https://github.com/dataplumber/nexus/blob/master/
    > client/requirements.txt
    >
    > '''Climatology'''
    >  * matplotlib
    >  * numpy
    >  * netCDF4
    >  * pathos (https://pypi.python.org/pypi/pathos)
    >
    > '''Data-access'''
    >  * https://github.com/dataplumber/nexus/blob/master/
    > data-access/requirements.txt
    >
    > '''Nexus-ingest'''
    > ''Dataset-tiler''
    >  * https://github.com/dataplumber/nexus/tree/master/
    > nexus-ingest/dataset-tiler/build/reports
    >
    > ''developer-box''
    >  * Just a collection of scripts/vagrant file used to stand up a developer
    > instance of nexus ingestion. No dependencies to report
    >
    > ''Groovy-scripts''
    >  * Collection of Groovy scripts that can be used as part of data
    > ingestion. They only rely on the standard Groovy library and the
    > ‘nexus-messages’ project
    >
    > ''Nexus-messages''
    >  * https://github.com/dataplumber/nexus/tree/master/
    > nexus-ingest/nexus-messages/build/reports
    >
    > ''nexus-sink''
    >  * https://github.com/dataplumber/nexus/tree/master/
    > nexus-ingest/nexus-sink/build/reports
    >
    > ''nexus-xd-python-modules''
    >  * https://github.com/dataplumber/nexus/blob/master/
    > nexus-ingest/nexus-xd-python-modules/package-list.txt
    >  * https://github.com/dataplumber/nexus/blob/master/
    > nexus-ingest/nexus-xd-python-modules/requirements.txt
    >
    > ''spring-xd-python''
    >  * only python standard libraries are used
    >
    > ''tcp-shell''
    >  * https://github.com/dataplumber/nexus/tree/master/
    > nexus-ingest/tcp-shell/build/reports
    >
    > '''tools/deletebyquery'''
    >  * https://github.com/dataplumber/nexus/blob/master/tools/deletebyquery/
    > requirements.txt
    >
    > = Required Resources =
    > Mailing Lists
    >  * private@sdap.incubator.apache.org
    >  * dev@sdap.incubator.apache.org
    >  * commits@sdap.incubator.apache.org
    >
    > Git Repos
    >  * https://git-wip-us.apache.org/repos/asf/incubator-nexus.git
    >  * https://git-wip-us.apache.org/repos/asf/incubator-doms.git
    >  * https://git-wip-us.apache.org/repos/asf/incubator-mudrod.git
    >
    > Issue Tracking
    >  * JIRA Science Data Analytics Platform (SDAP)
    >
    > Continuous Integration
    >  * Jenkins builds on https://builds.apache.org/
    >
    > Web
    >  * http://sdap.incubator.apache.org/
    >  * wiki at http://cwiki.apache.org
    >
    > = Initial Committers =
    > The following is a list of the planned initial Apache committers (the
    > active subset of the committers for the current repository on Github).
    >  * Lewis John McGibbney (lewismc@apache.org)
    >  * Vardis M. Tsontos (vardis.m.tsontos@jpl.nasa.gov)
    >  * Joseph C. Jacob (Joseph.C.Jacob@jpl.nasa.gov)
    >  * Ed Armstrong (edward.m.armstrong@jpl.nasa.gov)
    >  * Frank Greguska (greguska@jpl.nasa.gov)
    >  * Brian Wilson (brian.wilson@jpl.nasa.gov)
    >  * Chaowe Phil Yang (cyang3@gmu.edu)
    >  * Yongyao Jiang (yjiang8@gmu.edu)
    >  * Yun Li (yli38@gmu.edu)
    >  * Shawn R. Smith (smith@coaps.fsu.edu)
    >  * Jocelyn Elya (jelya@coaps.fsu.edu)
    >  * Mark Bourassa (bourassa@coaps.fsu.edu)
    >  * Thomas Cram (tcram@ucar.edu)
    >  * Thomas Huang (thomas.huang@jpl.nasa.gov)
    >  * Steven Worley (worley@ucar.edu)
    >  * Zaihua Ji (zji@ucar.edu)
    >
    > = Affiliations =
    > NASA JPL
    >  * Lewis John McGibbney (lewismc@apache.org)
    >  * Vardis M. Tsontos (vardis.m.tsontos@jpl.nasa.gov)
    >  * Joseph C. Jacob (Joseph.C.Jacob@jpl.nasa.gov)
    >  * Ed Armstrong (edward.m.armstrong@jpl.nasa.gov)
    >  * Frank Greguska (greguska@jpl.nasa.gov)
    >  * Thomas Huang (thomas.huang@jpl.nasa.gov)
    >  * Brian Wilson (brian.wilson@jpl.nasa.gov)
    >
    > George Mason University
    >  * Chaowe Phil Yang (cyang3@gmu.edu)
    >  * Yongyao Jiang (yjiang8@gmu.edu)
    >  * Yun Li (yli38@gmu.edu)
    >
    > Center for Ocean-Atmospheric Prediction Studies, Florida State University
    >  * Shawn R. Smith (smith@coaps.fsu.edu)
    >  * Jocelyn Elya (jelya@coaps.fsu.edu)
    >  * Mark Bourassa (bourassa@coaps.fsu.edu)
    >
    > Computational Information Systems Laboratory (CISL) / National Center for
    > Atmospheric Research (NCAR)
    >  * Thomas Cram (tcram@ucar.edu)
    >  * Zaihua Ji (zji@ucar.edu)
    >  * Steven Worley (worley@ucar.edu)
    >
    > = Sponsors =
    >
    > = Champion =
    > * Lewis McGibbney (NASA/JPL)
    >
    > = Nominated Mentors =
    >  * TBD
    >  * TBD
    >  * TBD
    >
    > = Sponsoring Entity =
    > The Apache Incubator
    >
    >
    > --
    > http://home.apache.org/~lewismc/
    > @hectorMcSpector
    > http://www.linkedin.com/in/lmcgibbney
    >
    
    
    
    -- 
    http://home.apache.org/~lewismc/
    @hectorMcSpector
    http://www.linkedin.com/in/lmcgibbney
    



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message