airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjaya Medonsa <sanjaya...@gmail.com>
Subject Re: Apache Airavata-OODT Integration
Date Sat, 15 Jun 2013 11:04:08 GMT
Thanks Chris for your help! Working directory is available in
JobExecutionContext in Airavata and directory can easily be retrieved.
Issue in my case is that, from XBaya GUI I take product id as input not the
file name. Internally file stager query the file manager using product id
to retrieve product reference and corresponding file name to stage the file
into input dir. Since this product id to file name mapping happens
internally during the file staging, my implementation don't have access to
filename unless I query the file manager to retrieve the corresponding file
name using product id.

One of the major issue in my implementation seems that I use OODT product
id as input, not the file name. Should I change my implementation to use
file name instead of product id ?

Best Regards,
Sanjaya


On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Sanjaya,
>
> Easy, see the attached PGEConfig.xml here:
>
> http://paste.apache.org/6OGW
>
> In that file:
>
> 1. We compute the staged file path by computing JobDir
> 2. We create in the exe block a staged input dir
> 3. We stage the files just using cps in the exeBlock (could have
> just as easily used fileStager)
> 4. We know that the file is [JobInputDir]/[Filename]
>
> HTH.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Sanjaya Medonsa <sanjayamrt@gmail.com>
> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> Date: Friday, June 14, 2013 5:02 AM
> To: Airavata Dev <dev@airavata.apache.org>
> Subject: Re: Apache Airavata-OODT Integration
>
> >Thanks Chris for your input. I actually use the PGETaskInstance for file
> >staging with minimal additional code. But my issue issue not with the file
> >staging. As per my current implementation, application inputs product id.
> >Then using the capabilities in PGETaskInstance class, it does the file
> >staging. But my issue is that during the file staging product is mapped to
> >a file in specified working directory. I don't have a way to retrieve the
> >staged file name, as it is not recorded in Metadata (For this purpose, I
> >query the FileManager again to get the corresponding reference name for a
> >given product id). I need the staged file path, since I modify the input
> >product id into staged file path prior to actual workflow invocation.
> >Basically I am looking for some implementation where I can easily
> >retrieve,
> >staged file path for a given product id.
> >
> >Cheers,
> >Sanjaya
> >
> >
> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> Hi Sanjaya,
> >>
> >> -----Original Message-----
> >>
> >> From: Sanjaya Medonsa <sanjayamrt@gmail.com>
> >> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> >> Date: Monday, June 10, 2013 5:20 PM
> >> To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> >> Cc: "dev@oodt.apache.org" <dev@oodt.apache.org>
> >> Subject: Re: Apache Airavata-OODT Integration
> >>
> >> >Hi Chris,
> >> >       On configuration, I have get rid of all the configuration files,
> >> >including pge-config.xml. All the required configurations are
> >> >programmatically set.  Configurations such FileManagerServer URL are
> >> >configured in the airavata-server.properties file. I'll update the
> >>review
> >> >request with modified details.
> >>
> >> Great work!
> >>
> >>
> >> >       Still I am not quite clear on how to retrieve staged file path
> >> >properly. Currently I am using getStagedFilePath method
> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file
> >>path.
> >> >While I am going through the OODT code that I have seen method in
> >> >DataTransferer to notify FileManagerServer once transfer is completed.
> >>But
> >> >I couldn't see the same for product retrieval.
> >>
> >> Example:
> >>
> >>
> http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-con
> >>fi
> >> g.xml
> >>
> >>
> >> Review Board tickets:
> >> https://reviews.apache.org/r/4746/
> >>
> >> https://reviews.apache.org/r/5382/
> >>
> >>
> >> JIRA issue source (in OODT since 0.4):
> >>   https://issues.apache.org/jira/browse/OODT-443
> >>
> >>
> >> >       As you suggested I'll improve my workflow using Apache Tika. I'd
> >> >like to continue this as an Parallal task. While modifying staging
> >> >implementation based on community feedback, currently I am looking at
> >> >ingesting output back to OODT.
> >>
> >> See above for info on file staging. I would strongly encourage you not
> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
> >>expressive
> >> anyways and I would work to figure out how to make Airavata leverage
> >> CAS-PGE.
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattmann@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >> >
> >> >
> >> >
> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >> >
> >> >> Hi Sanjaya,
> >> >>
> >> >> I think starting out with /bin/ls would be good, maybe like a /bin/ls
> >> >> workflow, and then for each file returned, maybe run Apache Tika and
> >> >> extract its metadata and then pipe that to a file?
> >> >>
> >> >> How about that?
> >> >>
> >> >> Cheers,
> >> >> Chris
> >> >>
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Chris Mattmann, Ph.D.
> >> >> Senior Computer Scientist
> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> Office: 171-266B, Mailstop: 171-246
> >> >> Email: chris.a.mattmann@nasa.gov
> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Adjunct Assistant Professor, Computer Science Department
> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Sanjaya Medonsa <sanjayamrt@gmail.com>
> >> >> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> >> >> Date: Tuesday, June 4, 2013 5:31 AM
> >> >> To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> >> >> Cc: "dev@oodt.apache.org" <dev@oodt.apache.org>
> >> >> Subject: Re: Apache Airavata-OODT Integration
> >> >>
> >> >> >Hi Chris,
> >> >> >     Please see my comments below on the two items.
> >> >> >
> >> >> >Configuration : It should be possible to set them programmatically.
> >> >> >Actually I have implemented partly it for file staging information.
> >> >>I'll
> >> >> >work to get rid of the other configuration files.
> >> >> >
> >> >> >Staged File Path : I'll work on the suggested approach, though
I am
> >>not
> >> >> >fully understand it at the moment. I guess I need to go through
bit
> >> >>more
> >> >> >on
> >> >> >CAS-PGE and come back to you on the proposed approach.
> >> >> >
> >> >> >Currently I am testing this by wrapping /bin/ls command as GFac
> >> >>service. I
> >> >> >may need to test this with real workflow. Could you please provide
> >>me
> >> >>know
> >> >> >some guidance on better scenario to test this.
> >> >> >
> >> >> >Cheers,
> >> >> >Sanjaya
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >> >> >
> >> >> >> Hi Sanjaya,
> >> >> >>
> >> >> >> -----Original Message-----
> >> >> >>
> >> >> >> From: Sanjaya Medonsa <sanjayamrt@gmail.com>
> >> >> >> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
> >> >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org>,
> >> >> >>"dev@airavata.apache.org"
> >> >> >> <dev@airavata.apache.org>
> >> >> >> Subject: Apache Airavata-OODT Integration
> >> >> >>
> >> >> >> >Hi,
> >> >> >> >     I have worked on the Apache Airavata integration
with Apache
> >> >> >>OODT. As
> >> >> >> >a first step, I have implemented integration with Apache
OODT
> >>file
> >> >> >> >manager component.
> >> >> >>
> >> >> >> Great work!!
> >> >> >>
> >> >> >> Comments below:
> >> >> >>
> >> >> >> >      1. Introduce a new GFac Schema type called OODTProduct
> >>which
> >> >> >>takes
> >> >> >> >APache OODT product IDs as input.
> >> >> >> >      2. Implemented new pre GFac Handler by extending
Apache
> >>OODT
> >> >> >> >PgeTaskInstance to stage the corresponding file into the
working
> >> >> >> >directory.
> >> >> >> >      3. Once file is staged, input parameter with OODT
product
> >>id
> >> >>is
> >> >> >> >replaced with path of the staged file for downstream processing
> >> >> >> >
> >> >> >> >I have tested the implementation with Gfac application
which
> >>wraps
> >> >> >>/bin/ls
> >> >> >> >command. Application takes product id as input and stage
> >> >>corresponding
> >> >> >> >file
> >> >> >> >into the working directory and /bin/ls is executed against
the
> >> >>staged
> >> >> >> >file.
> >> >> >> >Hope this is a valid testing scenario.
> >> >> >> >
> >> >> >> >Concerns
> >> >> >> >- Configurations : I have added new configuration file
named and
> >> >> >> >oodt-integration.properties in addition to dynamic_metadata.met
> >>and
> >> >> >> >pge-config.xml files used by OODT. But at the moment there
is no
> >> >>item
> >> >> >> >configured with the oodt-integration.properties.
> >> >> >>
> >> >> >> You probably only need the pge-config.xml file. Dynamic metadata,
> >>and
> >> >> >>the
> >> >> >> task configuration properties can be specified programmatically,
> >> >>right?
> >> >> >>
> >> >> >> >- Staged File Name - With the current implementation of
> >> >> >>PgeTaskInstance it
> >> >> >> >is not possible to retrieve path of the staged file. Due
to this
> >> >> >> >limitation, I have query the FileManagerServer with product
id
> >>and
> >> >> >> >retrieve
> >> >> >> >the file name and computed the file path using information
of
> >> >>working
> >> >> >> >directory.
> >> >> >>
> >> >> >> I'm not sure I understand this? If you store and record the
> >>Filename,
> >> >> >>and
> >> >> >> FileLocation
> >> >> >> metadata files, then you can easily retrieve the staged file
path
> >> >>via a
> >> >> >> SQLquery
> >> >> >> via CAS-PGE by simply setting the
> >>FORMAT=('$FileLocation/$Filename')
> >> >>in
> >> >> >> the response.
> >> >> >> Can you comment on this?
> >> >> >>
> >> >> >> >- Currently it is not possible to execute the workflow
using
> >>Xbaya
> >> >>due
> >> >> >>to
> >> >> >> >validation failure due to new schema type. I have commented
out
> >>the
> >> >> >> >relevant validation code for testing purpose.
> >> >> >>
> >> >> >> OK, will probably need to work on this.
> >> >> >>
> >> >> >> >
> >> >> >> >Currently I am having an issue with review board client
tool and
> >> >>need
> >> >> >>to
> >> >> >> >resolve it to upload the code for review.
> >> >> >>
> >> >> >> I see later that you got this working, so will head over and
> >>review
> >> >>that
> >> >> >> now.
> >> >> >>
> >> >> >> Thanks!
> >> >> >>
> >> >> >> Cheers,
> >> >> >> Chris
> >> >> >>
> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >> Chris Mattmann, Ph.D.
> >> >> >> Senior Computer Scientist
> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> >> Office: 171-266B, Mailstop: 171-246
> >> >> >> Email: chris.a.mattmann@nasa.gov
> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >> Adjunct Assistant Professor, Computer Science Department
> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message