oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mallder, Valerie" <Valerie.Mall...@jhuapl.edu>
Subject RE: what is batch stub? Is it necessary?
Date Thu, 09 Oct 2014 16:06:44 GMT
What are XSD's ? And, what do you mean by "restrictions"?  Do you mean 'definitions' of what
can be included within the task and configuration xml files? If so, then I a totally agree
with you.

Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory


> -----Original Message-----
> From: Ramirez, Paul M (398J) [mailto:paul.m.ramirez@jpl.nasa.gov]
> Sent: Wednesday, October 08, 2014 10:38 PM
> To: <dev@oodt.apache.org>
> Subject: Re: what is batch stub? Is it necessary?
>
> +1 billion
>
> --Paul
>
> Sent from my iPhone
>
> > On Oct 8, 2014, at 5:55 PM, Lewis John Mcgibbney
> <lewis.mcgibbney@gmail.com> wrote:
> >
> > Folks,
> > Is it possible to create a parent issue for defining XSD's for all of
> > the XML file we need ti OODT?
> > I do not know them all, but from this thread alone, it is clear that
> > we could do with setting some kind of restrictions on what can be
> > included within task and configuration XML within OODT.
> > Thoughts?
> > Lewis
> >
> > On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) <
> > Rishi.Verma@jpl.nasa.gov> wrote:
> >
> >> Hi Val,
> >>
> >> Yep - here?s a link to the tasks.xml file:
> >>
> >> https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan
> >> /workflow/src/main/resources/policy/tasks.xml
> >>
> >>> The problem is that the ExternScriptTaskInstance is unable to
> >>> recognize
> >> the command line arguments that I want to pass to the
> >> crawler_launcher script.
> >>
> >>
> >> Hmm.. could you share your workflow manager log, or better yet, the
> >> batch_stub output? Curious to see what error is thrown.
> >>
> >> Is a script file being generated for your PGE? For example, inside
> >> your [PGE_HOME] directory, and within the particular job directory
> >> created for your execution of a workflow, you will see some files
> >> starting with ?sciPgeExeScript_??. You?ll find one for your
> >> pgeConfig, and you can check to see what the PGE commands actually
> >> translate into, with respect to a shell script format. If that file
> >> is there, take a look at it, and validate whether the command works
> >> within the script (i.e. copy/paste and run the crawler command manually).
> >>
> >> Another suggestion is to take a step back, and build up slowly, i.e.:
> >> 1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo
> >> ?Hello APL.? > /tmp/test.txt</cmd>) 2. If above works, do a
> >> crawler_launcher empty command(e.g.
> >> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify
the
> >> batch_stub or Workflow Manager prints some kind of output when you
> >> run the workflow.
> >> 3. Build up your crawler_launcher command piece by piece to see where
> >> it is failing
> >>
> >> Thanks,
> >> Rishi
> >>
> >> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie
> >> <Valerie.Mallder@jhuapl.edu>
> >> wrote:
> >>
> >>> Hi Rishi,
> >>>
> >>> Thank you very much for pointing me to your working example. This is
> >> very helpful.  My pgeConfig looks very similar to yours.  So, I
> >> commented out the resource manager like you suggested and tried
> >> running again without the resource manager. And my problem still
> >> exists. The problem is that the ExternScriptTaskInstance is unable to
> >> recognize the command line arguments that I want to pass to the
> >> crawler_launcher script. Could you send me a link to your tasks.xml file? I'm
> curious as to how you defined your task.
> >> My pgeConfig and tasks.xml are below.
> >>>
> >>> Thanks!
> >>> Val
> >>>
> >>>
> >>> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig>
> >>>
> >>>  <!-- How to run the PGE -->
> >>>  <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
> >>>       <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
> >> --launchAutoCrawler \
> >>>       --filemgrUrl [FILEMGR_URL] \
> >>>       --clientTransferer
> >> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> >>>       --productPath [JobInputDir] \
> >>>       --mimeExtractorRepo
> >> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \
> >>>       --actionIds MoveFileToLevel0Dir</cmd>  </exe>
> >>>
> >>>  <!-- Files to ingest -->
> >>>  <output/>
> >>>  </output>
> >>>
> >>> <!-- Custom metadata to add to output files -->  <customMetadata>
> >>>     <metadata key="JobDir" val="[OODT_HOME]"/>
> >>>     <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
> >>>     <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
> >>>     <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
> >>> </customMetadata>
> >>>
> >>> </pgeConfig>
> >>>
> >>>
> >>>
> >>> <!-- tasks.xml **************************************************-->
> >>>
> >>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
> >>>
> >>>  <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
> >> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance
> >> ">
> >>>     <conditions/>  <!-- There are no pre execution conditions right
> >>> now
> >> -->
> >>>     <configuration>
> >>>
> >>>         <property name="ShellType" value="/bin/sh" />
> >>>         <property name="PathToScript"
> >> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
> >>>
> >>>         <property name="PGETask_Name" value="crawler_launcher PGE
> >> Task"/>
> >>>         <property name="PGETask_ConfigFilePath"
> >> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
> >> envReplace="true" />
> >>>     </configuration>
> >>>  </task>
> >>>
> >>> </cas:tasks>
> >>>
> >>> Valerie A. Mallder
> >>> New Horizons Deputy Mission System Engineer Johns Hopkins
> >>> University/Applied Physics Laboratory
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
> >>>> Sent: Wednesday, October 08, 2014 6:01 PM
> >>>> To: dev@oodt.apache.org
> >>>> Subject: Re: what is batch stub? Is it necessary?
> >>>>
> >>>> Hi Valerie,
> >>>>
> >>>>>>>> All I am trying to do is run "crawler_launcher" as a
workflow
> >>>>>>>> task in the CAS PGE environment.
> >>>>
> >>>> Interesting. I have a working example here [1] you can look at that
> >> does this exact
> >>>> thing.
> >>>>
> >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast
tell
> >>>>>>>> me what it is, why it is necessary, and how to run it
(please
> >>>>>>>> provide exact syntax to put in my startup shell script,
because
> >>>>>>>> I would never be able to figure it out for myself and
I don't
> >>>>>>>> want to have to bother everyone again.)
> >>>>
> >>>> Batchstub is only necessary if your Workflow Manger is sending jobs
> >>>> to
> >> Resource
> >>>> Manager for execution (where the default execution is to run the
> >>>> job in
> >> something
> >>>> called a ?batch stub? executable). Think of batch stubs as a small
> >> wrapper
> >>>> program that takes a bundle of executable instructions from
> >>>> Resource
> >> Manager,
> >>>> and executes them in a shell environment within a given remote (or
> >> local) machine.
> >>>>
> >>>> Here?s my suggestion:
> >>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute
> >>>> the following command (it?ll start a batch stub in a terminal on port
2001):
> >>>>> ./batch_stub 2001
> >>>>
> >>>> If the above step doesn?t fix your problem, you can also try having
> >> Workflow
> >>>> Manager NOT send jobs to Resource Manager for execution, and
> >>>> instead
> >> execute
> >>>> jobs locally through Workflow Manager itself (on localhost only!).
> >>>> To
> >> disable job
> >>>> transfer to Resource Manger, you?ll need to modify the Workflow
> >>>> Manager properties file ($OODT_HOME/wmgr/etc/workflow.properties),
> >>>> and
> >> specifically
> >>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
> >> line.
> >>>> I?ve done this in my example code below, see [2] for an exact
> >>>> example
> >> of this.
> >>>> After modifying workflow.properties, make sure to restart workflow
> >> manager
> >>>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by
> $OODT_HOME/wmgr/bin/wmgr
> >>>> start).
> >>>>
> >>>> Thanks,
> >>>> Rishi
> >>>>
> >>>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> >> netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam
> >> ple.xml
> >>>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> >>>> netscan/workflow/src/main/resources/etc/workflow.properties
> >>>>
> >>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
> >>>> <paul.m.ramirez@jpl.nasa.gov> wrote:
> >>>>
> >>>>> Valerie,
> >>>>>
> >>>>> I would have thought it would have just not used a batch stub by
> >> default. That
> >>>> said if you go into the $OODT_HOME/resmgr/bin there should be a
> >>>> script
> >> to start a
> >>>> batch stub. Right now on my phone I forget the name of the script
> >>>> but
> >> if you more
> >>>> the file you will see the Java class name that corresponds to below.
> >> You should
> >>>> specify a port when you run the script which from the looks of the
> >> output below
> >>>> should be 2001.
> >>>>>
> >>>>> HTH,
> >>>>> Paul R
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie <
> >> Valerie.Mallder@jhuapl.edu>
> >>>> wrote:
> >>>>>>
> >>>>>> Well then, I'm proud to be a member :)  (I think .... )
> >>>>>>
> >>>>>>
> >>>>>> Valerie A. Mallder
> >>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
> >>>>>> University/Applied Physics Laboratory
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> >>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
> >>>>>>> To: dev@oodt.apache.org
> >>>>>>> Subject: Re: what is batch stub? Is it necessary?
> >>>>>>>
> >>>>>>> You have every right to bother everyone.
> >>>>>>> You won't get what you need unless you do.
> >>>>>>>
> >>>>>>> You get one honorary membership in the Society of General
> >>>>>>> Agitators
> >>>>>>> - at the rank of Major Agitator.
> >>>>>>>
> >>>>>>> Bruce B.
> >>>>>>>
> >>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
> >>>>>>> <Valerie.Mallder@jhuapl.edu
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I am still having trouble getting my CAS PGE crawler
task to
> >>>>>>>> run due to
> >>>>>>>> http://localhost:2001 being "down". I have spent the
last 2
> >>>>>>>> days tracing through the resource manager code and tracked
this
> >>>>>>>> down to line 146 of LRUScheduler where the XmlRpcBatchMgr
is
> >>>>>>>> failing to execute the task remotely, because on line
75 of
> >>>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr
on
> >>>>>>>> its line 74) is trying to call "isAlive" on the webservice
> >>>>>>>> named "batchstub" which, to my knowledge, is not running
> >>>>>>>> because I have
> >> not done
> >>>> anything explicitly to run it.
> >>>>>>>>
> >>>>>>>> All I am trying to do is run "crawler_launcher" as a
workflow
> >>>>>>>> task in the CAS PGE environment.  I had it running perfectly
> >>>>>>>> before I started trying to make it run as part of a
workflow.
> >>>>>>>> I really miss my crawler and really want it to run again
L
> >>>>>>>>
> >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast
tell
> >>>>>>>> me what it is, why it is necessary, and how to run it
(please
> >>>>>>>> provide exact syntax to put in my startup shell script,
because
> >>>>>>>> I would never be able to figure it out for myself and
I don't
> >>>>>>>> want to have to bother everyone again.)
> >>>>>>>>
> >>>>>>>> Thanks so much!
> >>>>>>>>
> >>>>>>>> Val
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Valerie A. Mallder
> >>>>>>>>
> >>>>>>>> New Horizons Deputy Mission System Engineer The Johns
Hopkins
> >>>>>>>> University/Applied Physics Laboratory
> >>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> >>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
> >>>>
> >>>> ---
> >>>> Rishi Verma
> >>>> NASA Jet Propulsion Laboratory
> >>>> California Institute of Technology
> >>
> >> ---
> >> Rishi Verma
> >> NASA Jet Propulsion Laboratory
> >> California Institute of Technology
> >
> >
> > --
> > *Lewis*

Mime
View raw message