taverna-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stian Soiland-Reyes <st...@apache.org>
Subject Re: GSoC 2016 Docker support for Taverna
Date Mon, 21 Mar 2016 16:45:01 GMT
On 21 March 2016 at 00:51, Nadeesh Dilanga <nadeesh092@gmail.com> wrote:

> First of all, apologize for the delayed response. I wanted to give my self
> bit more time to understand and going through what Taverna is and what
> exactly the expected outcome of the project (tutorials and related slide
> decks and also youtube videos were very helpful). Because this will be my
> one and only GSoC proposal and I want it to be perfect!.

Thanks!  You don't have to do it perfect - just great! :-))

> 1. Taverna is a BPMN like(but more extensive and scoped more widely in
> features) workflow engine which has several ways of creating work flows and
> different interfaces of access them.

While I guess we don't like to be compared with BPMN, I think you are
correct. :)

>  2. When creating workflows, one major extension point to cater custom use
> cases is, to plug/create your own services/service types which is a great
> model IMHO. And this project is in fact to write an adapter(activity plugin
> which I believe is the executor of an invocation of a service) when some
> one needs to run something on Docker at some phase of his workflow.

Correct - thus one could have a workflow with multiple tools from
different docker images.

> if #2 is correct, can you please provide me an example of an use case which
> led to this project idea, because feels I may be missing something here.
> Because IMHO, even for docker eventually it will be a service invocation
> from a workflow front, and what Tarvena needs is some activity plugins that
> are aware of the particular transport protocols.

We already have the Tool activity which allow you to run command line
tools - however such workflows are hard to share as anyone receiving
it may not have that tool installed, or in the same version/location.

While approaches like https://www.debian.org/devel/debian-med/ and
BioLinux have helped towards "How to get it installed" - it then moves
the requirement to a particular operating system, which in a way is

Docker solves the "How to consistently install this tool" problem -
and even works (almost) seemlessly from OS X and Windows. It adds nice
reproducibility aspects as you can mark the exact snapshot version of
the docker image you have used.

There are now also initiatives such as http://bioboxes.org/ (and  to a
certain degreehttp://bio.tools/ ) which describe bioinformatics tools
as Docker images - thus these can in theory be used directly from

Perhaps part of the project would be to define a use case so we find
some actual command lines we want to run in a Taverna workflow - e.g.
to run HMMER for sequence alignment using
https://hub.docker.com/r/dockerbiotools/hmmer/ using sequences fetched
from an EBI web service?  I am not sure how much of the bioinformatics
side you would like to get into! :)

> (example: http service hosted in Docker, Http activity plugin, Message
> Broker service hosted in Docker, you need AMQP,MQTT like activity plugin)

Yes, but I don't think we want to run many of those kind of services
from Taverna, I was thinking more of running just command line tools
that happen to be packaged as Docker images.

> 3. Or the case is to invoke some composite applications that
> deployed/installed in Docker disregarding what the protocols are ?

No, this would get a bit more complex, so I would stay away from that
for the GSOC project - although of course the potential is very
interesting motivation as well.

I think this is what I described in

> if #3 is correct, what we run in the docker container can be another
> Taverna workflow. If that is the case your idea on "Save workflow as Docker
> image" will be a superb addition!.

Yes! It should then be possible! But.. why? :)  Run with older Taverna version?

One interesting thing could be if there's also "Save workflow as
Docker image" - if such a docker image is added as a Docker image -
would be to "unwrap" it and show the inner workflow in Taverna.

With Docker there's a big danger of going down the "It's turtles all
the way down" recursion - hence I tried to scope the GSOC ideas to be
more concrete about running command line tools.

>  So with this, I would like to understand what Taverna community expect
> from "Invoking Docker from Taverna"  on this GSoC project. So that I can be
> more specific on my project proposal and make it the best project for this
> summer for Taverna.
> On Fri, Mar 18, 2016 at 7:18 AM, Stian Soiland-Reyes <stain@apache.org>
> wrote:
>> On 17 March 2016 at 15:22, alaninmcr <alaninmcr@googlemail.com> wrote:
>> >> I found Docker as an excellent solution for scaling, easy deployment and
>> >> obviously a hot topic these days in enterprises who want to implement
>> >> micro
>> >> services based architecture/deployment for low footprint
>> servers/services.
>> >>
>> >> I presume the idea behind Docker support for Taverna is NOT from a micro
>> >> service standpoint, but more like from a packaging and deployment
>> >> perspective. Please correct me if I am wrong.
>> No, you are right in that our current Docker ideas would not be about
>> creating Taverna (or Taverna workflow) as a micro-service,. but to use
>> Docker for execution.
>> A similar aspect could be to use Docker to start up a set of
>> microservices accompanying the Workflow, and then access them from
>> Taverna workflow using the existing WSDL and REST activities.
>> This is something that I am interested in within the
>> http://bioexcel.eu/ project - but is a bit more architecturally
>> challenging as it would mean things like dynamic port bindings in the
>> workflow configuration. It
>> I've tracked this as https://issues.apache.org/jira/browse/TAVERNA-941
>> but IMHO it would be a too big task for a GSOC project.
>> > There are two separate issues:
>> >
>> > https://issues.apache.org/jira/browse/TAVERNA-901 is to allow Taverna
>> > workflows to include steps that are tools that inside docker containers.
>> > That would be deployment of an existing docker.
>> >
>> > https://issues.apache.org/jira/browse/TAVERNA-879 is to create docker
>> > containers for Taverna workflows. That is packaging and (because the
>> > containers will be part of a CWL workflow) deployment.
>> Nadeesh, I've added your interest to
>> https://cwiki.apache.org/confluence/display/TAVERNADEV/2016-03+GSOC+2016
>> but if you are more interested in packaging for Docker, then perhaps
>> we could look at the existing Docker wrapping of Taverna Server
>> https://hub.docker.com/r/taverna/taverna-server/
>> https://github.com/taverna-extras/taverna-server-docker
>> and consider doing something similar for our command line tools
>> "executeworkflow" and "tavlang".
>> That shouldn't take you too long - so you may want to prototype one of
>> TAVERNA-901 and TAVERNA-879 as well.
>> I know Dmitry used wsdl-generic as a command line tool as in
>> http://inb.bsc.es/documents/galaxygears/ which could also be
>> interesting as a Docker container (e.g. for running WSDL services
>> within a CWL workflow), but I am not sure where the source code for
>> that is (is that outside Apache, Dmitry?)
>> >> If that is the case, can you please clarify what is the current
>> packaging
>> >> deployment model ?
>> For Taverna 2.5 we used install4j via Maven to package into an installer:
>> https://github.com/apache/incubator-taverna-commandline/blob/old/taverna-commandline-product-core-20141228/pom.xml#L1712
>> That's what made the installers we have at
>> https://taverna.incubator.apache.org/download/command-line-tool/
>> One packaging task we could consider for Taverna 3.0 is to update
>> https://github.com/apache/incubator-taverna-commandline/tree/master/taverna-commandline-product
>> to use install4j or similar to generate such installers also for
>> Taverna 3, which has a slightly different
>> folder structure.
>> As an open source project we have 5 licenses for Install4j, but we
>> have not asked the author yet if this is still valid under Apache.
>> Now releasing under Apache license instead of LGPL we would ironically
>> now be allowed to bundle the binary Oracle JRE rather than having to
>> use the open source
>> OpenJDK builds.
>> But I'm afraid such a task would not involve Docker - as I think most
>> users of Taverna Command line would not have Docker (or even the right
>> Java version) installed.
>> > There is no current mechanism for packaging up something to run a
>> specific
>> > Taverna workflow. You can run workflows from the command line tool or on
>> a
>> > Taverna Server.
>> Making a recipe for generating Docker images for running a particular
>> Taverna Workflow could be interesting. We could then have "Save
>> workflow as Docker image" built into Taverna!
>> If you are thinking about such an idea, feel free to suggest it as a
>> new Jira task!
>> Overall - you don't have to pick exactly our ideas - you can be
>> inspired by them and will have to write your own proposal about what
>> work you propose to do (which should be reasonably scoped and
>> scheduled) and say how Apache Taverna would benefit.
>> Looking forward to hear more about your ideas!
>> --
>> Stian Soiland-Reyes
>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> http://orcid.org/0000-0001-9842-9718

Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)

View raw message