oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@gmail.com>
Subject Re: how to pass arguments to workflow task that is external script
Date Tue, 07 Oct 2014 15:01:52 GMT
Thanks Val, I agree, yes, CAS-PGE is complex.

Did you see the learn by example wiki page:

https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example


I think it¹s pretty basic and illustrates what CAS-PGE does.

Basically the jist of it is:

1. you only need to create a PGEConfig.xml file that specifies:
  - how to generate input for your integrated algorithm
  - how to execute your algorithm (e.g., how to generate a script that
executes it)
  - how to generate metadata from the output, and then to crawl the files
+ met and get the outputs into the file manager

2. you go into workflow tasks.xml, define a new CAS-PGE type task, point
at this config file, and provide CAS-PGE task properties (an example is
here:
http://svn.apache.org/repos/asf/oodt/trunk/pge/src/main/resources/examples/
WorkflowTask/


If you want to see a basic example of CAS-PGE in action, check out DRAT:

https://github.com/chrismattmann/drat/

It¹s a RADIX-based deployment with 2 CAS-PGEs (one for the MIME partition;
and another
for RAT).

Check that out, see how DRAT works (and integrates CAS-PGE) and then let
me know
if you are still confused and I will be glad to help more.

Cheers,
Chris

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: "Mallder, Valerie" <Valerie.Mallder@jhuapl.edu>
Reply-To: <dev@oodt.apache.org>
Date: Tuesday, October 7, 2014 at 4:56 PM
To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: RE: how to pass arguments to workflow task that is external script

>Thanks Chris,
>
>The CAS-PGE is pretty complex, I've read the documentation and it is
>still way over my head.  Is there any documentation or examples for how
>to integrate the crawler into it?  For instance, can I still use the
>crawler_launcher script? Will the ExternMetExtractor and a
>postIngestSuccess ExternAction script work that I created to work with
>the crawler still work "as is" in the CAS-PGE ? Or, should I invoke them
>differently?  What about the Metadata that I extracted with the crawler?
>Do I have to redefine the metadata elements in another configuration file
>or policy file?  If there is any documentation on doing this please point
>me to the right place because I didn't see anything that addressed these
>kinds of questions.
>
>Thanks,
>Val
>
>Do I have to define these any differently in the PGE configuration
>
>
>Valerie A. Mallder
>New Horizons Deputy Mission System Engineer
>Johns Hopkins University/Applied Physics Laboratory
>
>> -----Original Message-----
>> From: Chris Mattmann [mailto:chris.mattmann@gmail.com]
>> Sent: Tuesday, October 07, 2014 8:16 AM
>> To: dev@oodt.apache.org
>> Subject: Re: how to pass arguments to workflow task that is external
>>script
>>
>> Hi Val,
>>
>> Thanks for the detailed report. My suggestion would be to use CAS-PGE
>>directly
>> instead of ExternScriptTaskInstance. That application is not well
>>maintained,
>> doesn?t produce a log, etc, etc, all of the things you?ve noted.
>>
>> CAS-PGE on the other hand, will (a) prepare input for your task; (b)
>>describe how
>> to run your task (even as a script and will generate a script); and (c)
>>will run met
>> extractors and fork a crawler in your job directory in the end.
>>
>> I think it?s what you?re looking for and it?s way more well documented
>>on the wiki.
>>
>> Please check it out and let me know what you think.
>>
>> Cheers,
>> Chris
>>
>> ------------------------
>> Chris Mattmann
>> chris.mattmann@gmail.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: "Mallder, Valerie" <Valerie.Mallder@jhuapl.edu>
>> Reply-To: <dev@oodt.apache.org>
>> Date: Monday, October 6, 2014 at 11:53 PM
>> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
>> Subject: how to pass arguments to workflow task that is external script
>>
>> >Hello,
>> >
>> >I'm stuck again L  This time I'm stuck trying to start my crawler as a
>> >task using the workflow manager.  I am not using a PGE task right now.
>> >I'm just trying to do something simple with the workflow manager,
>> >filemgr, and crawler.  I have read all of the documentation that is
>> >available on the workflow manager and have tried to piece together a
>> >setup based on the examples, but, things seem to be working differently
>> >now and the documentation hasn't caught up, which is totally
>> >understandable  and not a criticism. Just want you to know that I try
>> >to do my due diligence before bothering anyone for help.
>> >
>> >I am not running the resource manager, and I have commented out setting
>> >the resource manager url in the workflow.properties file so that
>> >workflow manager will execute the job locally.
>> >
>> >I am sending workflow manager an event (via the command line using
>> >wmgr-client) called "startJediPipeline". Workflow manager receives the
>> >event, and retrieves my workflow from the repository and tries to
>> >execute the first (and only) task, and then it crashes.  My task is an
>> >external script (the crawler_launcher script) and I need to pass
>> >several arguments to it. I've spent all day trying to figure out how to
>> >pass arguments to the and ExternScriptTaskInstance, but there are no
>> >examples of doing this, so I had to wing it. I tried putting the
>> >arguments in the task configuration properties. That didn't work. So I
>> >tried putting the arguments in the metadata properties, and that hasn't
>> >worked. So, your suggestions are welcome!  Thanks so much.  Here's the
>> >error log,  And contents of my tasks.xml file follow it at the end.
>> >
>> >Workflow Manager started PID file
>> >(/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/
>> >run
>> >/cas.workflow.pid).
>> >Starting OODT File Manager [  Successful  ] Starting OODT Resource
>> >Manager [  Failed  ] Starting OODT Workflow Manager [  Successful  ]
>> >slothrop:{~/project/jedi/users/jedi-pipeline/oodt-deploy/bin} Oct 06,
>> >2014 5:48:30 PM
>> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager
>> >loadProperties
>> >INFO: Loading Workflow Manager Configuration Properties from:
>> >[/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/
>> >etc
>> >/workflow.properties]
>> >Oct 06, 2014 5:48:30 PM
>> >org.apache.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory
>> >getResmgrUrl
>> >INFO: No Resource Manager URL provided or malformed URL: executing jobs
>> >locally. URL: [null] Oct 06, 2014 5:48:30 PM
>> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager <init>
>> >INFO: Workflow Manager started by malldva1 Oct 06, 2014 5:48:41 PM
>> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent
>> >INFO: WorkflowManager: Received event: startJediPipeline Oct 06, 2014
>> >5:48:41 PM org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager
>> >handleEvent
>> >INFO: WorkflowManager: Workflow Jedi Pipeline Workflow retrieved for
>> >event startJediPipeline Oct 06, 2014 5:48:41 PM
>> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>> >checkTaskRequiredMetadata
>> >INFO: Task: [Crawler Task] has no required metadata fields Oct 06, 2014
>> >5:48:42 PM
>> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>> >executeTaskLocally
>> >INFO: Executing task: [Crawler Task] locally
>> >java.lang.NullPointerException
>> >        at
>> >org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance.run(Exte
>> >rnS
>> >criptTaskInstance.java:72)
>> >        at
>> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ex
>> >ecu
>> >teTaskLocally(IterativeWorkflowProcessorThread.java:574)
>> >        at
>> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ru
>> >n(I
>> >terativeWorkflowProcessorThread.java:321)
>> >        at
>> >EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
>>Source)
>> >        at java.lang.Thread.run(Thread.java:745)
>> >Oct 06, 2014 5:48:42 PM
>> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>> >executeTaskLocally
>> >WARNING: Exception executing task: [Crawler Task] locally: Message:
>> >null
>> >
>> >
>> >
>> >
>> ><cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>> ><!--
>> >  TODO: Add some examples
>> >-->
>> >
>> >   <task id="urn:oodt:crawlerTask" name="Crawler Task"
>> 
>>>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"/>
>> >      <conditions/>  <!-- There are no pre execution conditions right
>> >now
>> >-->
>> >      <configuration>
>> >          <property name="ShellType" value="/bin/sh" />
>> >          <property name="PathToScript"
>> >value="[OODT_HOME]/crawler/bin/crawler_launcher"/>
>> >      </configuration>
>> >      <metadata>
>> >          <args>
>> >             <arg>--operation</arg>
>> >                <arg>--launchAutoCrawler</arg>
>> >             <arg>--productPath</arg>
>> >                <arg>[OODT_HOME]/data/staging</arg>
>> >             <arg>--filemgrUrl</arg>
>> >                <arg>http://localhost:9000</arg>
>> >             <arg>--clientTransferer</arg>
>> >
>> ><arg>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory<
>> >/ar
>> >g>
>> >             <arg>--mimeExtractorRepo</arg>
>> >
>> ><arg>[$OODT_HOME]/extensions/policy/mime-extractor-map.xml</arg>
>> >             <arg>--actionIds</arg>
>> >                <arg>MoveFileToLevel0Dir</arg>
>> >          </args>
>> >      </metadata>
>> ></cas:tasks>
>> >
>> >
>> >Valerie A. Mallder
>> >
>> >New Horizons Deputy Mission System Engineer The Johns Hopkins
>> >University/Applied Physics Laboratory
>> >11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>> >240-228-7846 (Office) 410-504-2233 (Blackberry)
>> >
>>
>



Mime
View raw message