oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: what is batch stub? Is it necessary?
Date Mon, 13 Oct 2014 05:17:00 GMT
+1, we should definitely do this, Lewis.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Date: Wednesday, October 8, 2014 at 5:54 PM
To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: Re: what is batch stub? Is it necessary?

>Folks,
>Is it possible to create a parent issue for defining XSD's for all of the
>XML file we need ti OODT?
>I do not know them all, but from this thread alone, it is clear that we
>could do with setting some kind of restrictions on what can be included
>within task and configuration XML within OODT.
>Thoughts?
>Lewis
>
>On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) <
>Rishi.Verma@jpl.nasa.gov> wrote:
>
>> Hi Val,
>>
>> Yep - here¹s a link to the tasks.xml file:
>>
>> 
>>https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/wor
>>kflow/src/main/resources/policy/tasks.xml
>>
>> > The problem is that the ExternScriptTaskInstance is unable to
>>recognize
>> the command line arguments that I want to pass to the crawler_launcher
>> script.
>>
>>
>> Hmm.. could you share your workflow manager log, or better yet, the
>> batch_stub output? Curious to see what error is thrown.
>>
>> Is a script file being generated for your PGE? For example, inside your
>> [PGE_HOME] directory, and within the particular job directory created
>>for
>> your execution of a workflow, you will see some files starting with
>> ³sciPgeExeScript_Š². You¹ll find one for your pgeConfig, and you can
>>check
>> to see what the PGE commands actually translate into, with respect to a
>> shell script format. If that file is there, take a look at it, and
>>validate
>> whether the command works within the script (i.e. copy/paste and run the
>> crawler command manually).
>>
>> Another suggestion is to take a step back, and build up slowly, i.e.:
>> 1. Do an ³echo² command within your PGE first. (e.g. <cmd> echo ³Hello
>> APL.² > /tmp/test.txt</cmd>)
>> 2. If above works, do a crawler_launcher empty command(e.g.
>> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
>> batch_stub or Workflow Manager prints some kind of output when you run
>>the
>> workflow.
>> 3. Build up your crawler_launcher command piece by piece to see where it
>> is failing
>>
>> Thanks,
>> Rishi
>>
>> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie
>><Valerie.Mallder@jhuapl.edu>
>> wrote:
>>
>> > Hi Rishi,
>> >
>> > Thank you very much for pointing me to your working example. This is
>> very helpful.  My pgeConfig looks very similar to yours.  So, I
>>commented
>> out the resource manager like you suggested and tried running again
>>without
>> the resource manager. And my problem still exists. The problem is that
>>the
>> ExternScriptTaskInstance is unable to recognize the command line
>>arguments
>> that I want to pass to the crawler_launcher script. Could you send me a
>> link to your tasks.xml file? I'm curious as to how you defined your
>>task.
>> My pgeConfig and tasks.xml are below.
>> >
>> > Thanks!
>> > Val
>> >
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <pgeConfig>
>> >
>> >   <!-- How to run the PGE -->
>> >   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
>> >        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
>> --launchAutoCrawler \
>> >        --filemgrUrl [FILEMGR_URL] \
>> >        --clientTransferer
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>> >        --productPath [JobInputDir] \
>> >        --mimeExtractorRepo
>> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \
>> >        --actionIds MoveFileToLevel0Dir</cmd>
>> >   </exe>
>> >
>> >   <!-- Files to ingest -->
>> >   <output/>
>> >   </output>
>> >
>> > <!-- Custom metadata to add to output files -->
>> >   <customMetadata>
>> >      <metadata key="JobDir" val="[OODT_HOME]"/>
>> >      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
>> >      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
>> >      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
>> >   </customMetadata>
>> >
>> > </pgeConfig>
>> >
>> >
>> >
>> > <!-- tasks.xml **************************************************-->
>> >
>> > <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>> >
>> >   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
>> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
>> >      <conditions/>  <!-- There are no pre execution conditions right
>>now
>> -->
>> >      <configuration>
>> >
>> >          <property name="ShellType" value="/bin/sh" />
>> >          <property name="PathToScript"
>> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
>> >
>> >          <property name="PGETask_Name" value="crawler_launcher PGE
>> Task"/>
>> >          <property name="PGETask_ConfigFilePath"
>> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
>> envReplace="true" />
>> >      </configuration>
>> >   </task>
>> >
>> > </cas:tasks>
>> >
>> > Valerie A. Mallder
>> > New Horizons Deputy Mission System Engineer
>> > Johns Hopkins University/Applied Physics Laboratory
>> >
>> >
>> >> -----Original Message-----
>> >> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
>> >> Sent: Wednesday, October 08, 2014 6:01 PM
>> >> To: dev@oodt.apache.org
>> >> Subject: Re: what is batch stub? Is it necessary?
>> >>
>> >> Hi Valerie,
>> >>
>> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow
>>task
>> >>>>>> in the CAS PGE environment.
>> >>
>> >> Interesting. I have a working example here [1] you can look at that
>> does this exact
>> >> thing.
>> >>
>> >>>>>> So, if "batchstub" is necessary in this scenario, pleast
tell me
>> >>>>>> what it is, why it is necessary, and how to run it (please
>>provide
>> >>>>>> exact syntax to put in my startup shell script, because
I would
>> >>>>>> never be able to figure it out for myself and I don't want
to
>>have
>> >>>>>> to bother everyone again.)
>> >>
>> >> Batchstub is only necessary if your Workflow Manger is sending jobs
>>to
>> Resource
>> >> Manager for execution (where the default execution is to run the job
>>in
>> something
>> >> called a ?batch stub? executable). Think of batch stubs as a small
>> wrapper
>> >> program that takes a bundle of executable instructions from Resource
>> Manager,
>> >> and executes them in a shell environment within a given remote (or
>> local) machine.
>> >>
>> >> Here?s my suggestion:
>> >> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
>> >> following command (it?ll start a batch stub in a terminal on port
>>2001):
>> >>> ./batch_stub 2001
>> >>
>> >> If the above step doesn?t fix your problem, you can also try having
>> Workflow
>> >> Manager NOT send jobs to Resource Manager for execution, and instead
>> execute
>> >> jobs locally through Workflow Manager itself (on localhost only!). To
>> disable job
>> >> transfer to Resource Manger, you?ll need to modify the Workflow
>>Manager
>> >> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and
>> specifically
>> >> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
>> line.
>> >> I?ve done this in my example code below, see [2] for an exact example
>> of this.
>> >> After modifying workflow.properties, make sure to restart workflow
>> manager
>> >> ($OODT_HOME/wmgr/bin/wmgr stop   followed by $OODT_HOME/wmgr/bin/wmgr
>> >> start).
>> >>
>> >> Thanks,
>> >> Rishi
>> >>
>> >> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>> >>
>> 
>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample.
>>xml
>> >> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>> >> netscan/workflow/src/main/resources/etc/workflow.properties
>> >>
>> >> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
>> >> <paul.m.ramirez@jpl.nasa.gov> wrote:
>> >>
>> >>> Valerie,
>> >>>
>> >>> I would have thought it would have just not used a batch stub by
>> default. That
>> >> said if you go into the $OODT_HOME/resmgr/bin there should be a
>>script
>> to start a
>> >> batch stub. Right now on my phone I forget the name of the script but
>> if you more
>> >> the file you will see the Java class name that corresponds to below.
>> You should
>> >> specify a port when you run the script which from the looks of the
>> output below
>> >> should be 2001.
>> >>>
>> >>> HTH,
>> >>> Paul R
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie <
>> Valerie.Mallder@jhuapl.edu>
>> >> wrote:
>> >>>>
>> >>>> Well then, I'm proud to be a member :)  (I think .... )
>> >>>>
>> >>>>
>> >>>> Valerie A. Mallder
>> >>>> New Horizons Deputy Mission System Engineer Johns Hopkins
>> >>>> University/Applied Physics Laboratory
>> >>>>
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
>> >>>>> Sent: Wednesday, October 08, 2014 4:54 PM
>> >>>>> To: dev@oodt.apache.org
>> >>>>> Subject: Re: what is batch stub? Is it necessary?
>> >>>>>
>> >>>>> You have every right to bother everyone.
>> >>>>> You won't get what you need unless you do.
>> >>>>>
>> >>>>> You get one honorary membership in the Society of General
>>Agitators
>> >>>>> - at the rank of Major Agitator.
>> >>>>>
>> >>>>> Bruce B.
>> >>>>>
>> >>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
>> >>>>> <Valerie.Mallder@jhuapl.edu
>> >>>>>> wrote:
>> >>>>>
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> I am still having trouble getting my CAS PGE crawler task
to run
>> >>>>>> due to
>> >>>>>> http://localhost:2001 being "down". I have spent the last
2 days
>> >>>>>> tracing through the resource manager code and tracked this
down
>>to
>> >>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing
to
>> >>>>>> execute the task remotely, because on line 75 of
>> >>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr
on
>>its
>> >>>>>> line 74) is trying to call "isAlive" on the webservice named
>> >>>>>> "batchstub" which, to my knowledge, is not running because
I have
>> not done
>> >> anything explicitly to run it.
>> >>>>>>
>> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow
>>task
>> >>>>>> in the CAS PGE environment.  I had it running perfectly
before I
>> >>>>>> started trying to make it run as part of a workflow.  I
really
>>miss
>> >>>>>> my crawler and really want it to run again L
>> >>>>>>
>> >>>>>> So, if "batchstub" is necessary in this scenario, pleast
tell me
>> >>>>>> what it is, why it is necessary, and how to run it (please
>>provide
>> >>>>>> exact syntax to put in my startup shell script, because
I would
>> >>>>>> never be able to figure it out for myself and I don't want
to
>>have
>> >>>>>> to bother everyone again.)
>> >>>>>>
>> >>>>>> Thanks so much!
>> >>>>>>
>> >>>>>> Val
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Valerie A. Mallder
>> >>>>>>
>> >>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
>> >>>>>> University/Applied Physics Laboratory
>> >>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>> >>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
>> >>>>>>
>> >>>>>>
>> >>
>> >> ---
>> >> Rishi Verma
>> >> NASA Jet Propulsion Laboratory
>> >> California Institute of Technology
>> >
>>
>> ---
>> Rishi Verma
>> NASA Jet Propulsion Laboratory
>> California Institute of Technology
>>
>>
>
>
>-- 
>*Lewis*


Mime
View raw message