oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Foster <holeno...@me.com>
Subject Re: Staging CAS-PGE config file...
Date Thu, 07 Jun 2012 00:40:53 GMT
Okay... totally got behind on this thread... the purpose of this is that regardless of whether
we are talking about CAS-PGE running in the resource manager or just some other generic resource
manager job, typically any job requires some set of files to exist before it runs, a temp
directory to work in, and temp directory cleanup... currently CAS-PGE (or any other job) has
to implement this logic (this really should be controlled at a higher level -- this will also
avoid directory collisions across jobs as well)... now if CAS-PGE needs a file from the filemgr
that is something CAS-PGE should be responsible for.  So in relation to the emails below,
pge-config.xml is the file that needs to exist on or be visible to the machine before CAS-PGE
is run (CAS-PGE really shouldn't have to stage that file -- it makes for a hacky implementation
in CAS-PGE anyway).

I invision such a change to the resource manager would include being able to specify a XML
file with a list of need files for the job to run, and at runtime the resource manager would
stage those files to the temp working directory it created for the job and then clean them
up after job execution.  Something like:

<reqInput class="file.staging.class">
  <file src="/path/to/pge-config.xml" dest="path/relative/to/temp/working/dir/pge-config.xml"/>
</reqInput>

you could imagine that later you could even extend it to support zip packages which it could
stage and unzip:
<reqInput class="file.staging.class">
  <file src="/path/to/package.zip" dest="path/relative/to/temp/working/dir/package" postCopyHandler="unzip.logic.class"/>
</reqInput>

This would be ideal for cloud computing since you could then package up your JDK, binaries,
etc and the resource manager would make sure they were installed on the machine before executing
its job.

-brian

On May 01, 2012, at 11:11 PM, "Mattmann, Chris A (388J)" <chris.a.mattmann@jpl.nasa.gov>
wrote:

Hey Brian,

Thanks, comments below:

On May 1, 2012, at 5:20 PM, Brian Foster wrote:

> hey guys,
> 
> in the wengine branched CAS-PGE, it supported staging the CAS-PGE's XML config file to
tmp directory so it could be parsed and then processed and then the staged config file was
copied CAS-PGE's working directory (had to be copied later since the working directory information
is in the config file). I think this should be something the resource manager should instead
support... staging job binaries and config that is need to run the jobs would be a cleaner
implementation than what wengine CAS-PGE does... CAS-PGE would still stage Products and ingest
them itself (that is a CAS-PGE specified task), however the knowledge of getting CAS-PGE's
configuration file which configures it should already be there when it runs... otherwise you
kinda need configuration for CAS-PGE configuration (chicken and egg problem)... what you guys
think?

Were you seeing this as resource manager functionality in terms of copying CAS-PGE's XML config
file? Which one, the pge-config.xml
style file, or the cas-metadata file (dyn-met)? Also, does CAS-PGE have to solve that problem?
I mean I think I agree with you in the sense
that this functionality should be provided, but perhaps provided by WorkflowTaskJob (the Workflow
implementation of the Resource Manager
job). The issue here is that that's the standard interface between Workflow and Resource manager,
and to have a different one for CAS-PGE 
would defeat the purpose of having CAS-PGE as a specialized WorkflowTask (which I think it
is).

So, this one is a weird one. My gut feeling is to say -- does CAS-PGE even need to be that
meta? Isn't solving the file staging 
for input products enough, and then saying that the system deployment has to be accessible
via NFS, or HDFS, or some global 
mount point? We are still using that paradigm fairly commonly e.g., in the Snow project at
JPL, in the Square Kilometre Array efforts, 
and on EDRN.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message