incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stian Soiland-Reyes <soiland-re...@cs.manchester.ac.uk>
Subject Re: [Proposal] Taverna workflow
Date Thu, 25 Sep 2014 16:19:41 GMT
Proposal now moved to the Apache wiki:

https://wiki.apache.org/incubator/TavernaProposal

I just used copy-paste - so there might be some mistakes introduced -
feel free to correct.


I will be away for 2 weeks - but my colleague Shoaib Sufi should have
signed up to this list to assist in any question during that period.



On 23 September 2014 13:43, Stian Soiland-Reyes
<soiland-reyes@cs.manchester.ac.uk> wrote:
> I hereby present the Apache Incubator proposal for the project Taverna.
>
>
> Also available in rich text in the Taverna wiki (with more hyperlinks!):
>
> http://dev.mygrid.org.uk/wiki/display/developer/Taverna+incubator+proposal
>
> (Could someone grant me access to edit the Incubator wiki pages? My
> wiki username is soilandreyes)
>
>
>
>
> # Abstract
>
> Taverna is an open source and domain-independent suite of tools used
> to design and execute data-driven workflows.
>
>
> # Proposal
>
> The Taverna suite includes:
>
> * Taverna Workbench, a Java-based desktop application for graphically
> composing, editing and executing workflows of distributed web services
> and local tools
> * Taverna Commandline Tool which allows repeated execution of
> parameterized workflow definitions
> * Taverna Server provides a REST and SOAP API for executing workflows
> * Taverna Player is a Ruby-based web interface towards the Server,
> providing a high-level view of workflow executions and their results,
> and allows further integrations with Ruby on Rails applications.
>
> Taverna can browse and combine different service types, allowing
> workflows to integrate steps of arbitrary REST and SOAP web services
> with command line tools (local and SSH), scripts (Beanshell, R,
> Jython) and finally visualize the results.
>
> The goal of the Taverna suite is to help researchers to access
> distributed datasets and processing capabilities by the construction
> of pipelines, and also to simplify the execution of  these pipelines
> in various environments.
>
> The Taverna suite of products is already successful and in wide-use
> across different domains. The software is currently licensed as LGPL
> 2.1, with copyright owned by University of Manchester. External
> contributors have all signed Apache-like CLAs.
>
>
> # Background
>
> Taverna workflows coordinate inputs and outputs between computational
> processes and Web Services. The workflow is designed in a graphical
> interface which shows the workflow as a series of boxes and arrows;
> representing processes and their data connections. The different
> processes in a workflow can be command line tools, REST and WSDL Web
> Services; which are used for combining steps such as data acquisition,
> filtering, cleaning, integrating, analysis and visualization. Taverna
> calls these processes "services", as they generally are provided by
> remote (third-party) servers.
>
> These kind of computational workflows, also known as pipelines and
> dataflows, focus on the movement of data rather than the execution
> order of the underlying processes. Features such as implicit
> iterations (where an input list of values causes multiple process
> executions) and parallel invocations (independent processes are
> executed as soon as their data is available) are intrinsic to a
> dataflow system, not requiring any particular constructs by the
> workflow designer.
>
> As a visual programming environment, workflows aids collaboration and
> reuse of workflows. At the highest level, a workflow represents the
> conceptual level of an analysis, allowing understanding, discussion
> and communication of the overall analysis protocol. More detail can be
> revealed and modified for individual steps. At the individual process
> level, the workflow defines execution specifics such as operations,
> parameters and command line tools.
>
> Sharing of the workflow definitions allows re-use and re-purposing of
> the computational analysis. During workflow execution, provenance can
> be collected from every step, allowing deep inspection of intermediate
> values for the purpose of debugging and validation.
>
>
> # Rationale
>
> There is a strong need to lower the barrier of entry to datasets and
> computational resources widely available on the Internet, to increase
> their use by researchers who understand the computational steps needed
> to produce their results, but who are not necessarily expert
> programmers. Taverna has already shown its success and popularity in a
> wide range of scientific disciplines.
>
>
> # Initial Goals
>
> * Transition mailing lists to Apache (keep existing subscribers, but
> invite more)
> * Taverna developer workshop (2014-10-30)
> * Prepare git repositories for move:
>   * Update headers/metadata to indicate Apache License 2.0
>   * Restructure git repositories
>   * Rename Maven groupIds to org.apache.taverna.*
>   * Rename packages to org.apache.taverna.*
>
> * Move Github repositories to Apache git
> * Automated builds in Apache's Jenkins
> * Update to latest releases of Apache dependencies
> * Propose updated release & testing procedure under Apache
> * Moved Website and documentation
>
> We intend to only release the current development version Taverna 3.x
> http://www.taverna.org.uk/developers/work-in-progress/taverna-3/ under
> the Apache umbrella (). 3.0 is not yet officially released - however
> the Taverna 3.0 Command Line can be released almost "as-is" after
> migration. The Taverna 3.0 Server is at beta quality, while the
> Taverna 3.0 Workbench is at alpha stage and would need to be
> stabilized to an initial beta release.
>
> * Before first release: Maven Central releases of Taverna support
> libraries (e.g. taverna-scufl2 and taverna-databundle)
> * First release: Apache Taverna Command Line 3.0 (OSGi-based)
> * Release: Apache Taverna Server 3.0
> * Release: Apache Taverna Workbench 3.0 beta
> * Provenance exchange with relevant Apache products (e.g. Apache
> CXF->Taverna->CouchDB)
> * Release: Apache Taverna Workbench 3.0
>
> It is not yet decided if the current Workbench Editions
> http://www.taverna.org.uk/download/workbench/2-5/ will be carried over
> to Taverna 3, or if this can be solved by having a "Install extra
> plugin" step on first start-up of Apache Taverna. In any case, we
> imagine that some of these specializing editions will be maintained
> outside (but in collaboration with) the Apache project. This is
> particularly the case for the Astronomy edition as it depends on
> several LGPL/GPL libraries and is maintained by the AstroTaverna team.
>
>
> # Current Status
>
> ## Meritocracy
>
> Taverna was initially created by the myGrid consortium in 2003. Since
> 2006, the majority of contributions to Taverna's core code-base, its
> architecture and direction have been led by staff at The University of
> Manchester and The European Bioinformatics Institute (EMBL-EBI).
>
> The project have benefited of a high-degree of extensions and
> integrations by other developers - but mainly in the form of plugins
> (http://www.taverna.org.uk/documentation/taverna-2-x/taverna-2-x-plugins/)
> and integrations
> (http://www.taverna.org.uk/developers/work-in-progress/taverna-online/
> http://www.taverna.org.uk/download/associated-tools/).
>
> Taverna's developer community have unfortunately not had a culture of
> submitting patches that would warrant later commit access - perhaps
> due to its background in the science community. However contributors
> have been added as committers when the plugin becomes a part of the
> core distribution (e.g. External Tool plugin by Möller and Krabbenhöft
> and AstroTaverna by Garrido), or when their development has required
> patches to the existing code base.
>
>
> ## Community
>
> Taverna has an active community of plug-in developers and users. The
> developer mailing list (taverna-hackers@lists.sourceforge.net) has 248
> members, the user mailing list (taverna-users@lists.sourceforge.net)
> has 370 members.
>
> 1500 users have registered as of 19 August 2014. Total downloads of
> all products since version 2.1 (released December 2009) is 35000.
>
> A Taverna Developer workshop is being arranged for 30 October 2014 to
> bring together developers and integrators of Taverna. We want to
> encourage plug-in developers to participate further also in the core
> development of Taverna, by introducing them to the code base and how
> to contribute. http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Open+Development+Workshop
>
> Active steps to grow the communities of users and developers by
> targeting specific research domains such as the work by Kevin Benson
> on Taverna's use in the Heliophysics and Astrophysics community.
> Susheel Varma is increasing usage of Taverna within the Biomedical
> domain. Julián Garrido and his work on AstroTaverna is promoting
> Taverna within the IVOA Virtual Astronomy community. Sonja Holl and
> Björn Hagemeier's are targeting high performance computing.
>
>
> ## Core Developers
>
> What we currently consider to be the core Taverna Team is (in
> alphabetical order):
>
> Christian Brenninkmeijer (University of Manchester)
> Donal Fellows (University of Manchester)
> Robert Haines (University of Manchester)
> Aleksandra Nenadic (University of Manchester)
> Dmitry Repchevsky (Barcelona Supercomputing Center)
> Stian Soiland-Reyes (University of Manchester)
> Shoaib Sufi  (University of Manchester)
> Vadim Surpin (Institute for Information Transmission Problems in Moscow)
> Alan Williams (University of Manchester)
>
> The team consists of experienced developers who have worked on a
> multitude projects, particular within writing software for supporting
> scientists. The committers list (See below) includes additionally
> plugin developers whose contributions have become part of Taverna.
> Part of our desire to join the Apache Foundation is to recognise their
> effort and promote them into also being "core developers".
>
>
> ## Alignment
>
> Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF,
> Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity,
> Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of
> the Taverna Server.
> As part of moving to Apache-compatible dependencies, Taverna will
> probably adopt OpenJPA to replace (LGPL) Hibernate.
>
>
>
> # Known Risks
>
> ## Orphaned products
>
> Most of the core developers are from the myGrid team at University of
> Manchester, but are funded through a series of projects - see
> http://www.mygrid.org.uk/projects/. Many of these projects incorporate
> Taverna, so the effort from Manchester is partially based on direct
> project requirements, but also partially a volunteer effort for
> project maintenance and general development. The myGrid team has
> guaranteed funding until 2017.
>
> The developers that are outside Manchester are generally funded for
> other activities, and so their effort to Taverna is to a greater
> extent a volunteer effort - although again project-specific
> requirements steer their effort (e.g. for a new Taverna plugin).
>
> One of the reasons for our desire to move to the Apache Foundation is
> to formalise this volunteering/contribution effort so that it becomes
> obvious that it is not just University of Manchester that is
> contributing to the core code base - and therefore reducing the
> impression that Taverna is vulnerable to Manchester’s future funding
> and projects.
>
>
> ## Inexperience with Open Source
>
> Taverna has been an open-source project since its first release in
> 2003. Most of the contributors also have experience with working with
> and contributing to other open source projects (e.g. TCL, CXF, Jena),
> particularly as Taverna strongly relies on other open source tools.
> Most of the research projects which the myGrid members have
> participated in produces open-source software.
>
>
> ## Homogeneous Developers
>
> The committers list includes many people from myGrid, University of
> Manchester in United Kingdom - but these developers have been working
> on a range of distributed and European projects in the field of
> scientific software - see http://www.mygrid.org.uk/projects/
>
> The other developers on the committers list come from many different
> projects and institutions across the world, from Russia, Canada,
> Germany and Spain.
>
>
> ## Reliance on Salaried Developers
>
> Development for Taverna is mainly performed as part of the developers'
> salaried work, but funded through many different projects at several
> institutions (see above). These projects don't generally have
> "contribute to Taverna" as their main goals - so therefore in many
> ways the effort is still volunteer-based - contributing to Taverna as
> a way to support one's own work.
>
> From our experience of running Taverna over the last 10 years, new
> contributors will continue to join as Taverna becomes an ingredient in
> new projects, while existing contributors more slowly fade out of
> their involvement. Often existing contributors and users gives the
> personal link to the new contributors.
>
>
> ## Relationships with Other Apache Products
>
> Apache already contains projects that seem relevant to Taverna.
>
> Apache Pig https://pig.apache.org/ is a high-level language for
> creating Map-Reduce programs for Apache Hadoop. There already exists
> third-party efforts to convert Taverna Workflows to Hadoop and Pig -
> https://github.com/umaqsud/taverna-to-pig
> https://github.com/schenck/taverna-to-hadoop (thus making a graphical
> interface for building Apache Pig workflows) - and part of the Apache
> Taverna effort would be to invite these to join the project.
>
> Apache Airavata http://airavata.apache.org/ is a software framework
> for executing and managing computational jobs and workflows on
> distributed computing resources. Taverna's concern is not as much job
> coordination, but more of a data flow between services. Airavata's
> XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL,
> but could be updated to work with Taverna 3's SCUFL2 format.
>
> Apache ODE https://ode.apache.org/ is a WS-BPEL workflow engine. BPEL
> as a workflow language is quite verbose compared to dataflow languages
> like Taverna, and is additionally bound to a particular protocol
> (SOAP). Nevertheless,  a sub-section of Taverna workflows could in
> theory run on the Apache ODE engine - and the Taverna 3 Platform API
> has facilities for plugging in alternative workflow engines. We have
> previously considered Apache Hadoop as one such alternate engine for
> executing a different subset of workflows with local command line
> tools.
>
> Apache Storm http://storm.incubator.apache.org/ is a distributed
> realtime computation framework. Experiments are under development to
> use Taverna as a front-end for creating Apache Storm workflows -
> http://markmail.org/message/zg5ylo2aucpwfc5j
>
> Apache has several popular frameworks for building REST/SOAP web
> services (Apache CXF, Apache Clerezza),  data services (Apache Jena,
> Apache Hive, Apache CouchDB) and specific workflow engines (Apache
> Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST
> and SOAP service client can be used for combining, testing and
> demonstrating such services.
>
>
> ## A Excessive Fascination with the Apache Brand
>
> Taverna is a long-running project (since 2003) with an existing user-
> and developer base across the academic world. Our main motivation for
> moving to Apache is to further encourage an open development process
> and engage existing and new developers to contribute to the core code
> base.  We also want to ensure long-term continuity of the Taverna
> products, and for its future directions to be decided by the whole
> Taverna community rather than one of the parties involved.
>
>
>
> # Documentation
>
> Taverna's documentation is available from
> http://www.taverna.org.uk/documentation/taverna-2-x/, including an
> extensive user manual at
> http://www.mygrid.org.uk/dev/wiki/display/taverna/User+Manual and
> tutorials http://www.taverna.org.uk/documentation/taverna-2-x/tutorials/
> and videos http://www.taverna.org.uk/documentation/taverna-2-x/videos/.
>
> The developer documentation
> http://dev.mygrid.org.uk/wiki/display/developer/Developers+Guide
> includes tutorials
> http://dev.mygrid.org.uk/wiki/display/developer/Tutorials for working
> with Taverna's source code and creating plugins.
>
>
> # Initial Source
>
> Taverna's source code is available from the 'taverna' github team
> account: https://github.com/taverna/. These 85 git repositories
> reflect the current modules of Taverna's plugin system after recently
> transitioning from Google Code SVN at
> http://taverna.googlecode.com/svn/taverna/. The history of Taverna's
> code base goes back to being hosted in CVS at SourceForge
> http://taverna.cvs.sourceforge.net/, transitioned as of
> http://taverna.googlecode.com/svn/archived/cvs2svn-2008-09-25/. Note
> that reasonable steps have been made to preserve commit history when
> moving between version control system, this has not always been
> achieved when moving between modules and refactoring larger Java
> packages. Some source files might therefore in git have initial
> commits like "Moved from /taverna/utils/trunk" referring to SVN paths.
>
> One of the reason for many repositories is that we rely on Apache
> Maven and a plugin system (since Taverna 3 OSGi-based) where different
> modules have different version numbers and release cycles (e.g.
> tags/branches). This is essential for the plug-in support of Taverna
> as the plug-ins depend on the semantic versioning of the APIs and
> required implementations.
>
> It is however in our current plans to merge repositories that have
> similar release cycles and greatly reduce the number of repositories.
>
> Taverna source code uses the package names (and children packages):
>
> net.sf.taverna - since Taverna 2
> uk.org.taverna  - new from Taverna 3
> org.taverna (sic) - Taverna Server
>
> Some contributed code uses package names depending on their
> originating projects:
>
> org.purl.wf4ever.provtaverna
> org.biomart.martservice
>
> We intend to release only the upcoming Taverna 3.0 version under the
> Apache umbrella (not 2.x) - therefore, according to semantic
> versioning rules http://semver.org/, the transition period of the
> Apache Incubator would be the best (and possibly only) chance to
> rename Java packages and Maven groupIDs to org.apache.taverna.* Under
> OSGi the packaging and JAR goes hand-in-hand (several JARs don't
> normally provide the same package), and therefore any package rename
> would be done together with the repository restructuring.
>
>
> # Source and Intellectual Property Submission Plan
>
> Taverna source code from http://github.com/taverna/
>
> (c) University of Manchester.
> Signed Apache-like CLAs for all external contributors.
> Current license is LGPL 2.1 (and GPL3 for one domain-specific
> download), as copyright holder Manchester can change this to Apache
> License 2.0
>
> taverna.org.uk domain - registrant University of Manchester
> http://www.taverna.org.uk/  content (c) University of Manchester
> http://dev.mygrid.org.uk/wiki/display/tav250/ Confluence wiki content
> (c) University of Manchester
> http://dev.mygrid.org.uk/wiki/display/developer Confluence wiki
> content (c) University of Manchester
>
> The details of intellectual property submission will be worked out
> together with myGrid project manager Shoaib Sufi and the University of
> Manchester's Contracts Office.
>
>
> # External Dependencies
>
> Taverna, as an integrating workflow system, has a fairly large number
> of dependencies - the latest 2.5.0 Core Workbench distribution has 517
> JARs (although many of those are duplicates in different versions)
>
> We are intending for our first Apache-based release to be Taverna 3,
> which has already reduced this dependency list.
>
> We have performed an analysis of our dependencies of Taverna 3 at
> http://dev.mygrid.org.uk/wiki/display/developer/Taverna+Dependencies -
> but this is not yet a complete list.
>
> A second analysis looks at the license of those dependencies at
> http://dev.mygrid.org.uk/wiki/display/developer/Third-party+licenses -
> where we have some incompatible (LGPL) dependencies. Most of these are
> resolvable as they are part of optional plugins to Taverna (e.g. R
> support, BioMart). The dependency on Hibernate requires some developer
> effort to be replaced with either Apache Open JPA or a "No-SQL"
> solution.
>
>
> # Cryptography
>
> Taverna uses these cryptography dependencies:
>
> BouncyCastle
> OpenJDK builds with the default JCE full encryption policy (bundled in
> installer)
>
> Taverna utilise these to form of an encrypted keystore (storing
> username/password and client certificates for third-party services
> accessed by the designed workflow) with corresponding user interface,
> and additionally binds to Java's SSL support to provide UI and command
> line options for security interactions, e.g. accepting new server
> certificates, or asking for username/passwords for HTTP Basic
> authentication (which can then be stored in the keystore).
>
>
> # Required Resources
>
> Taverna currently relies on a mixture of infrastructure hosted for
> free by third-parties (e.g. Github, SourceForge, GoogleCode,
> Launchpad, Bitbucket) and infrastructure hosted by myGrid at
> University of Manchester (Jenkins, Jira, Confluence, Wordpress).
>
> ## Mailing lists
>
> Existing mailing lists for Taverna are hosted at Sourceforge with
> archives at markmail. See http://www.taverna.org.uk/about/
>
> commits@taverna.incubator.apache.org  (replacing
> taverna-cvs@lists.sourceforge.net)
> private@taverna.incubator.apache.org (replacing support@mygrid.org.uk
> - to a lesser degree as we would want to encourage openness)
> dev@taverna.incubator.apache.org (replacing
> taverna-hackers@lists.sourceforge.net, 240 members)
> users@taverna.incubator.apache.org (replacing
> taverna-users@lists.sourceforge.net, 370 members)
>
>
> ## Git repositories
>
> The Taverna community would prefer to keep using git and Github, and
> we would request for experimental writable git repositories
> http://www.apache.org/dev/writable-git with mirroring to Github.
>
> The repositories would be named taverna-*, as the current repositories
> on the github team: https://github.com/taverna/. This repository
> organization is styled equivalent to the git repositories of cordova-*
> and couchdb-*.
>
> Exactly how repositories are split/merged is open for discussion - it
> is part of our current plan to reduce the number of repositories by
> merging common modules with a similar release cycle - this could be
> done at an early phase of the incubation period.
>
>
> ## Issue Tracking
>
> JIRA Taverna (TAV)
>
> Existing issues in Taverna 3's current JIRA -
> http://dev.mygrid.org.uk/issues/browse/T3 - should be imported - but
> its current list of Modules should be further agreed.
>
>
> ## Other Resources
>
> Wiki spaces in Confluence https://cwiki.apache.org/confluence -
> importing the most recent Taverna-related spaces and documentation
> from http://dev.mygrid.org.uk/wiki/spacedirectory/view.action?startIndex=24
> Jenkins - replacing myGrid Jenkins at http://build.mygrid.org.uk/ci/
> Maven repository at https://repository.apache.org/ - replacing myGrid
> artifactory http://repository.mygrid.org.uk/
> File-based web space for Plugin Update Site - replacing
> http://updates.taverna.org.uk/ and
> http://www.mygrid.org.uk/taverna/updates/
> Home pages - to be transitioned from from http://www.taverna.org.uk/ (Wordpress)
> Binary distribution download hosting, about ~8 GB pr release,
> replacing http://www.taverna.org.uk/download/ (currently downloads are
> hosted by http://launchpad.net/ and https://bitbucket.org/)
>
>
> # Initial Committers
>
> The initial list of committers reflect the current list of active
> developers at the Github team: https://github.com/orgs/taverna/people
> (Note that not all of these have made their membership public on
> Github)
>
>
> Alan R Williamsalan.r.williams@manchester.ac.uk
> Aleksandra Nenadica.nenadic@manchester.ac.uk
> Christian Y. Brenninkmeijerbrenninc@cs.man.ac.uk
> David Withersdavid.withers@gmail.com
> Dmitriy Repchevsky dmitry.repchevski@bsc.es
> Donal K. Fellowsdonal.k.fellows@manchester.ac.uk
> Finn Bacallfinn.bacall@manchester.ac.uk
> Hajo Nils Krabbenhöfthajo@krabbenhoeft.de
> Ian Dunlopian.dunlop@manchester.ac.uk
> Ingo WassinkI.H.C.Wassink@ewi.utwente.nl
> Julián Garridojgarrido@iaa.es
> Mark Wilkinsonmarkw@illuminae.com
> Luke McCarthyelmccarthy@gmail.com
> Robert Hainesrhaines@manchester.ac.uk
> Shoaib Sufishoaib.sufi@manchester.ac.uk
> Steffen Möllermoeller@inb.uni-luebeck.de
> Stian Soiland-Reyesstian@soiland-reyes.com   (Apache CLA Signed)
> Stuart Owensowen@cs.manchester.ac.uk
>
> In addition to the Core Team (mentioned earlier), this list also
> reflects Taverna's existing meritocrazy as it includes plugin
> developers whose contributions have been merged into the main code
> base. We acknowledge that not all of these are likely to continue as
> "Core" developers, but would like to encourage that during the
> Incubating process.
>
>
> # Affiliations
>
> The majority of the initial committers are employed by University of
> Manchester as part of the myGrid team, including responsibilities for
> contributing to and supporting Taverna.
> http://www.mygrid.org.uk/about-us/people/core-mygrid-team/.
>
> Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center,
> including responsibilities for contributing to Taverna. Steffen Möller
> is employed by University of Lübeck. Julián Garrido is employed by
> Instituto de Astrofísica de Andalucía.
>
>
> # Sponsor Champion
>
> Andy Seaborne
>
>
> # Nominated Mentors
>
> * Andy Seaborne
>
>
> # Sponsoring Entity
>
> The Incubator.
>
>
>
>
>
> Your feedback is very much welcome!
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message