oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susana Sanchez Exposito <...@iaa.es>
Subject Re: help: OODT component for distributing data through WAN
Date Tue, 21 Feb 2017 09:26:03 GMT
Thanks again Tom,

So, it seems that the OODT component that I am searching for is OODT
Workflow. I need to investigate about how to use this component to
implement a data delivery service, so I would like to ask you for
documentation about it.

Until now, I have installed Apache OODT (
https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT) and
I have been playing around with the File Manager component, following this
document:

https://cwiki.apache.org/confluence/display/OODT/OODT+Filemgr+User+Guide

However, I did not find a similar document for the OODT Workflow component.
I have just seen these wiki pages:

https://cwiki.apache.org/confluence/display/OODT/Workflow2+Quick+Start+Guide
https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide

I don't know the difference between Workflow1 and Workflow2, so I am not
sure if these are the guides that I should follow.

I have also found this tutorial:

https://oodt.apache.org/site_docs/cas-workflow/user/basic.html

But I think I would need something more to start to work with this
component, so if you can point me to other tutorials or documentation I
would be very grateful.


Susana.



2017-02-17 16:50 GMT+01:00 Tom Barber <tom.barber@meteorite.bi>:

> Hi Susana,
>
> Aggregating this and the off list email, you could technically connect the
> FM to the users storage but thats probably not the correct way to go about
> it.
>
> OODT is a toolbox at the end of the day so you pick the parts that enhance
> what you're already doing. One seems to certainly be the ingestion of data
> and capturing of metadata which could certainly be executed by the File
> Manager and as such OODT would then be the gateway to the ingested files.
> Off the back of that you could then implement a workflow that would trigger
> post ingestion or timed or whatever that would then figure out what to do
> with your data.
>
> For example, process ingests new data -> triggers workflow -> workflow
> looks at new data and looks up the metadata for the new files -> workflow
> then fires up GridFTP client or whatever delivery mechanism you use to
> deliver files to enduser
>
> of course in reality the workflow could be any number of steps and scale in
> many different ways, but that is one very simple OODT workflow overview.
>
> Tom
>
> On Fri, Feb 17, 2017 at 10:19 AM, Susana Sanchez Exposito <sse@iaa.es>
> wrote:
>
> > Thanks Tom,
> >
> > From your answer I guess that I can use the OODT component File Manager
> to
> > delivery large data products (from GBs to TBs) to users located remotely
> > (i.e users that are globally distributed).
> >
> > I have still some doubts, let me add them between your lines:
> >
> > 2017-02-16 13:18 GMT+01:00 Tom Barber <magicaltrout@apache.org>:
> >
> > > Hi Susana
> > >
> > > Welcome to the OODT list, this is indeed the correct place to ask about
> > > OODT related stuff.
> > >
> > > How you deliver data, I guess often depends on your requirements, but
> > OODT
> > > was certainly designed with that type of thing in mind.
> > >
> > > The file manager is very flexible in terms of storage and is a portal
> > > allowing for the ingestion of data products to a file store, this could
> > be
> > > a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
> > > something completely different. So the system will ingest data into the
> > >
> >
> > Do you mean that I can connect the File manager with the users' file
> > stores, so when the File Manager stores the data products, in the
> practice,
> > what it would be doing is to delivery the data products to the users?
> >
> > Given the users' file stores would be located remotely (possibly through
> > high latency networks), I would worried about the performance of this
> > option.
> >
> > In addition, with this option I would not be able to select/filter which
> > data products are delivered to each user, based on the metadata of the
> > products.
> >
> >
> >
> >
> > > file manager either through an API call, a crawling service or
> something
> > > else. During this operation metadata from the ingested files is then
> > > extracted, for example if this were an image, you could extract EXIF
> > data,
> > > GEO data etc and then store that in the catalogue alongside the
> ingested
> > > product.
> > >
> > > There is a basic UI for showing ingested products called Ops UI, but in
> > > reality for deployment as a service there would be a web interface
> > written
> > > to integrate into whatever application or portal you are already using,
> > > which would then allow users to search for products via metadata or
> keys
> > in
> > > the ingested data. From that search users could then do a range of
> things
> > > depending on what your requirements are, the simplest being clicking a
> > link
> > > to download the product. But of course it could be triggering a
> workflow,
> > > copying the file somewhere else or whatever.
> > >
> > > Behind the File Manager is also the workflow manager, so another
> scenario
> > > might be to ingest files into the file manager, which in turn triggers
> a
> > > workflow which then distributes the ingest files to people
> automatically,
> > > or performs some post processing etc.
> > >
> >
> > Ok. So, I would need to implement this workflow in such a way that 1) it
> > selects/filters which data products will be delivered to each user  and
> 2)
> > it sends the data products to the remote users, by means of efficient
> tools
> > for data movement (e.g. GridFTP)
> >
> >
> >
> > >
> > > Let us know if you have any further questions.
> > >
> >
> > Thanks again!
> >
> > Susana.
> >
> >
> >
> > >
> > > Tom
> > >
> > > On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <
> susanasanche@gmail.com>
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > I am trying to find out which of the components of Apache OODT is the
> > > most
> > > > suitable for delivering large data products to users located remotely
> > > > (users distributed on a WAN network)
> > > >
> > > > I have read the CAS File Manager has the capability to archive a file
> > to
> > > a
> > > > remote location, so it could be a candidate. However it seems, this
> > > > component was not designed for this purpose, so it is not recommended
> > for
> > > > distributing data through a  WAN network. Is that correct?
> > > >
> > > > I think the components that I am looking for are the Grid product
> > > services
> > > > (Product server/client, Profile server/client, Query server/client).
> > Am I
> > > > right?
> > > > If not, I would like to ask you to provide some information about
> which
> > > > OODT components I need to distribute data products through
> > international
> > > > networks.
> > > >
> > > > I was not sure if this is the correct email list to send this kind of
> > > > question. If not, sorry about that and it would be appreciate if you
> > > could
> > > > forward it to the appropriate email address.
> > > >
> > > > Thanks in advance,
> > > > Susana.
> > > >
> > >
> >
> >
> >
> > --
> > Susana Sánchez Expósito
> >
> > Instituto de Astrofísica de Andalucía - CSIC
> > Glorieta de la Astronomía, s/n. E-18008, Granada
> > Tel:(+34) 958 121 311 / (+34) 958 230 635
> > Fax:(+34) 958 814 530
> > e-mail: sse@iaa.es
> >
>



-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía - CSIC
Glorieta de la Astronomía, s/n. E-18008, Granada
Tel:(+34) 958 121 311 / (+34) 958 230 635
Fax:(+34) 958 814 530
e-mail: sse@iaa.es

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message