oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@gmail.com>
Subject Re: re: Question about OODT file manager
Date Thu, 06 Nov 2014 14:45:32 GMT
woot thanks

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: Zichuan Wang <zichuanw@usc.edu>
Reply-To: <dev@oodt.apache.org>
Date: Wednesday, November 5, 2014 at 11:22 PM
To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
Cc: Chris Mattmann <mattmann@usc.edu>, <dev@oodt.apache.org>, Luke liu
<shuailiu@usc.edu>, <xiaoyanj@usc.edu>, <zhoujian@usc.edu>
Subject: Re: re: Question about OODT file manager

>Googled around and find this little trick:
>
>export JAVA_OPTS=-Xmx2048m
>
>
>It works now, thanks professor!
>
>
>—
>Zichuan Wang
>Department of Computer Science, USC
>
>On Wed, Nov 5, 2014 at 10:40 PM, Mattmann, Chris A (3980)
><chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Got it. Can you increase the heap space on your batch stub? That
>> should take care of it.
>> Cheers,
>> Chris
>> P.S. Great work!
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> -----Original Message-----
>> From: Zichuan Wang <zichuanw@usc.edu>
>> Date: Wednesday, November 5, 2014 at 11:12 PM
>> To: Chris Mattmann <mattmann@usc.edu>
>> Cc: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>,
>>"dev@oodt.apache.org"
>> <dev@oodt.apache.org>, Luke liu <shuailiu@usc.edu>, "xiaoyanj@usc.edu"
>> <xiaoyanj@usc.edu>, "zhoujian@usc.edu" <zhoujian@usc.edu>
>> Subject: Re: re: Question about OODT file manager
>>>Dear Professor,
>>>
>>>
>>>I finally figured out how to trigger a post ingest event. However when I
>>>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>>>please take a look and maybe give some suggestions?
>>>
>>>
>>>➜  bin  ./crawler_launcher \
>>>--operation --launchAutoCrawler \
>>>--filemgrUrl http://localhost:9000 \
>>>--clientTransferer
>>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>>--productPath /Users/zichuanwang/Downloads/output \
>>>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>>>--workflowMgrUrl http://localhost:9200 \
>>>-ais TriggerPostIngestWorkflow
>>>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>>Setting property 'StdProductCrawler.clientTransferer'
>>>Setting property 'MetExtractorProductCrawler.clientTransferer'
>>>Setting property 'AutoDetectProductCrawler.clientTransferer'
>>>Setting property 'StdProductCrawler.filemgrUrl'
>>>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>>>Setting property 'StdProductCrawler.actionIds'
>>>Setting property 'MetExtractorProductCrawler.actionIds'
>>>Setting property 'AutoDetectProductCrawler.actionIds'
>>>Setting property 'StdProductCrawler.productPath'
>>>Setting property 'MetExtractorProductCrawler.productPath'
>>>Setting property 'AutoDetectProductCrawler.productPath'
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>>>[http://localhost:9200]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>>>[../policy/mime-extractor-map.xml]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>>>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>>>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>at java.io.UnixFileSystem.list(Native Method)
>>>at java.io.File.list(File.java:973)
>>>at java.io.File.listFiles(File.java:1129)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>>>at 
>>>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Cr
>>>aw
>>>lerLauncherCliAction.java:58)
>>>at 
>>>org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>>>at 
>>>org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>>>
>>>
>>>—
>>>Zichuan Wang
>>>Department of Computer Science, USC
>>>
>>>
>>>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
>>><mattmann@usc.edu> wrote:
>>>
>>>
>>>Thanks Luke, I’ve given you permissions so you should now see an
>>>“edit” button on that wiki page.
>>>
>>>Cheers, 
>>>Chris 
>>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Adjunct Associate Professor, Computer Science Department
>>>University of Southern California
>>>Los Angeles, CA 90089 USA
>>>Email: mattmann@usc.edu
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Luke liu <shuailiu@usc.edu>
>>>Date: Wednesday, November 5, 2014 at 6:48 PM
>>>To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><dev@oodt.apache.org>
>>>Cc: Chris Mattmann <mattmann@usc.edu>, "zhoujian@usc.edu"
>>><zhoujian@usc.edu>, "xiaoyanj@usc.edu" <xiaoyanj@usc.edu>, 'Zichuan
>>>Wang'
>>><zichuanw@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>>
>>>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>>>following account detail:
>>>> Account name: luke
>>>> Full Name: Shuai Liu (Luke)
>>>> Email: hanson311biz@gmail.com
>>>> Password: *******
>>>> 
>>>>But I am not sure where I can add my notes to the following web article
>>>>with 
>>>>which I had trouble , I also tried to create a new article, but failed
>>>>to 
>>>>do 
>>>>it as I cannot find a place where I can edit, does this have something
>>>>do 
>>>>with my account that is not visible for the "edit" or "comments"
>>>>action?
>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>>le
>>>> 
>>>> 
>>>> 
>>>>Thanks 
>>>>Luke 
>>>>-----Original Message-----
>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>Sent: Sunday, November 2, 2014 6:59 AM
>>>>To: Luke liu; dev@oodt.apache.org
>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>'Zichuan 
>>>>Wang' 
>>>>Subject: Re: re: Question about OODT file manager
>>>> 
>>>>Yes Luke, making the instructions better would be much appreciated!
>>>> 
>>>>If you have an account on the wiki please share it, else sign up for an
>>>>Apache OODT wiki account and please share it with me or anyone else on
>>>>dev@oodt and we’ll add you.
>>>> 
>>>>Cheers, 
>>>>Chris 
>>>> 
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Chief Architect
>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>Office: 168-519, Mailstop: 168-527
>>>>Email: chris.a.mattmann@nasa.gov
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>Southern California, Los Angeles, CA 90089 USA
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>-----Original Message-----
>>>>From: Luke liu <shuailiu@usc.edu>
>>>>Date: Sunday, November 2, 2014 at 1:32 AM
>>>>To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>,
>>>>"dev@oodt.apache.org"
>>>><dev@oodt.apache.org>
>>>>Cc: Chris Mattmann <mattmann@usc.edu>, "zhoujian@usc.edu"
>>>><zhoujian@usc.edu>, "xiaoyanj@usc.edu" <xiaoyanj@usc.edu>, 'Zichuan
>>>>Wang' 
>>>><zichuanw@usc.edu>
>>>>Subject: RE: re: Question about OODT file manager
>>>> 
>>>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>>>and there were some other issues such as missing jars; and sorry for
>>>>>not confirming this right away, my laptop was actually crashing, and i
>>>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>>>were still successfully ingested and placed in the output directory)
>>>>> 
>>>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>>>us to help correct the document(i.e.
>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>>>>>p
>>>>>le 
>>>>>)? 
>>>>>If possible, I would like to please share my notes with some problem
>>>>>steps mentioned there.
>>>>> 
>>>>>Anyway, thanks for your help and appreciated.
>>>>> 
>>>>>Thanks 
>>>>>Luke 
>>>>>-----Original Message-----
>>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>>>To: Luke; dev@oodt.apache.org
>>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>'Zichuan Wang'
>>>>>Subject: Re: re: Question about OODT file manager
>>>>> 
>>>>>Dear Luke, just confirming, we solved this in class right? It had to
>>>>>do
>>>>>with the batch stub not being turned on.
>>>>> 
>>>>>Cheers, 
>>>>>Chris 
>>>>> 
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Chris Mattmann, Ph.D.
>>>>>Chief Architect
>>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>Office: 168-519, Mailstop: 168-527
>>>>>Email: chris.a.mattmann@nasa.gov
>>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>>Southern California, Los Angeles, CA 90089 USA
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>-----Original Message-----
>>>>>From: Luke <shuailiu@usc.edu>
>>>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>>>To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>,
>>>>>"dev@oodt.apache.org"
>>>>><dev@oodt.apache.org>
>>>>>Cc: Chris Mattmann <mattmann@usc.edu>, "zhoujian@usc.edu"
>>>>><zhoujian@usc.edu>, "xiaoyanj@usc.edu" <xiaoyanj@usc.edu>,
'Zichuan
>>>>>Wang' 
>>>>><zichuanw@usc.edu>
>>>>>Subject: RE: re: Question about OODT file manager
>>>>> 
>>>>>>Dear Professor Mattamnn,
>>>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>>>sorry for getting back to you with my appreciation, I have been
>>>>>>conducting tests with OODT based on your advice, but unfortunately
I
>>>>>>am having another problem....
>>>>>> 
>>>>>>I am following the steps
>>>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Ex
>>>>>>a
>>>>>>mpl 
>>>>>>e 
>>>>>>) to get a sense of how to get workflow to work.
>>>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>>>wmgr-client 
>>>>>>command-line) does not seems to be invoked or executed, but I am
>>>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>>>executed, 
>>>>PFA: 
>>>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>>>with status RSUBMIT, but any new incoming tasks will be forwarded
to
>>>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>>>and 
>>>>6 RSUMBITE tasks.
>>>>>> 
>>>>>>Question 1): not sure why the workflow is not being executed, and
>>>>>>hanging at the state of "RSUBMIT", after enabling the log level, I
am
>>>>>>seeing the following entry in the log, not sure if this has anything
>>>>>>to do with the "hanging" problem where workflow is not getting
>>>>>>executed and hanging at state of "RSUBMIT".
>>>>>> Oct 28, 2014 3:35:07 AM
>>>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>>>safeCheckJobComplete
>>>>>> WARNING: Exception checking completion status for job:
>>>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>>>java.lang.NullPointerException
>>>>>> 
>>>>>>Question 2): I think currently on my side any new incoming workflow
>>>>>>task I am sending with the following command is being directed to
the
>>>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>>>this to a larger number though),
>>>>>> ./wmgr-client --url http://localhost:9200
>>>>>--operation --sendEvent
>>>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>>> If possible, I would like to please know if there is a way we can
>>>>>purge 
>>>>>>the queue and get rid of those workflow tasks either in "RSUMBIT"
and
>>>>>>"QUEUED" I have already sent, please kindly help.
>>>>>> 
>>>>>>Very sorry for troubling you with this, to be honest I find OODT a
>>>>>>bit
>>>>>>challenging to grasp within a short time frame, probably because
>>>>>>there
>>>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>>>just trial and error blended with guess, but I don’t want to make
a
>>>>>>blind guess, it will be appreciated if you can please also shed some
>>>>>>lights on where I can get more information logging or other way where
>>>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>>>happening when workflow reach the status "RSUBMIT" and how to get
a
>>>>>>specific logging info specific to it...
>>>>>> 
>>>>>>Again your advice and kind help will be appreciated usual.
>>>>>> 
>>>>>> 
>>>>>>Thanks 
>>>>>>Luke 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Mattmann, Chris A (3980)
>>>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> Sent: 2014年10月26日 22:18
>>>>>>> To: Luke; 'Zichuan Wang'
>>>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>>> dev@oodt.apache.org
>>>>>>> Subject: Re: re: Question about OODT file manager
>>>>>>> 
>>>>>>> Hi Luke, 
>>>>>>> 
>>>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>>>details 
>>>>>>>below: 
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Luke <shuailiu@usc.edu>
>>>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>>>> To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>, 'Zichuan
Wang'
>>>>>>> <zichuanw@usc.edu>
>>>>>>> Cc: Chris Mattmann <mattmann@usc.edu>, "zhoujian@usc.edu"
>>>>>>> <zhoujian@usc.edu>, "xiaoyanj@usc.edu" <xiaoyanj@usc.edu>,
>>>>>>> "dev@oodt.apache.org" <dev@oodt.apache.org>
>>>>>>> Subject: RE: re: Question about OODT file manager
>>>>>>> 
>>>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>>>> > 
>>>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>>>> >in the oodt to send json files to solr.
>>>>>>> >One of the difficulties is still getting OODT workflow to
call the
>>>>>>> >poster.py in etllib.
>>>>>>> 
>>>>>>> Sorry that you’re having difficulty let me try and help.
>>>>>>> 
>>>>>>> > 
>>>>>>> >I am not sure if my understanding is correct with OODT
>>>>>>>requirement,
>>>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>>>> > 
>>>>>>> >a set of goals in my mind with OODT is as follows, please
kindly
>>>>>>> >confirm and clarify:
>>>>>>> > 
>>>>>>> >1) 
>>>>>>> >Get the File-Manager up and running.
>>>>>>> 
>>>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>>>> 
>>>>>>> >2) 
>>>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>>>server. 
>>>>>>> >(I believe we can achieve it with a bash script or probably
python
>>>>>>> >that calls the command line sequentially with each json file
name
>>>>>>> >as 
>>>>>>>an 
>>>>>>> >argument?!)
>>>>>>> 
>>>>>>> Suggestion:
>>>>>>> 
>>>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>>>files (in place data transfer).
>>>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow
task
>>>>>>>that will wrap ETLlib and the poster command.
>>>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>>>through all of your indexed JSON files, and then for each one,
>>>>>>>submits a workflow event (you may need to look into aggregating
>>>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>>>> 
>>>>>>> >3) 
>>>>>>> >Once we have json files sent and stored in the File-Manager,
we
>>>>>>> >need 
>>>>>>>to 
>>>>>>> >get workflow-manager up and running, and we can create a
workflow
>>>>>>>that 
>>>>>>> >send those jsons file from the file manager to solr.
>>>>>>> 
>>>>>>> See above. 
>>>>>>> 
>>>>>>> >4) 
>>>>>>> >Create a workflow according to
>>>>>>> >Workflow2 User Guide
>>>>>>> 
>>>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gu
>>>>>>>>i
>>>>>>>>de> 
>>>>>>> >>>>>>>>>>> here comes the problem…..
>>>>>>> > I am not sure how to create a workflow task which can call
>>>>>>>the 
>>>>>>> >poster.py in python etllib, it looks like we need to create
our
>>>>>>>own
>>>>>>> >java class that extend <TaskInstance> which is an abstract
Java
>>>>>>> >class with one abstract method that has the following signature:
>>>>>>> > 
>>>>>>> > 
>>>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>>>> >crtlMetadata);
>>>>>>> > However, the detail of where to find the corresponding
>>>>>>> >libs and where to put our implementation in workflow manager
is
>>>>>>> >being neglected in that page. I am not sure if we should
use
>>>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>>>> >thru which it can call the python code i.e. poster.py. and
it
>>>>>>>looks
>>>>>>> >like we need to embody the TaskInstance::performExecution
by
>>>>>>> >injecting the code that calls the poster.py and return the
>>>>resultState. 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >It would be greatly appreciated if you could please shed
some
>>>>>>> >lights and advice how we can get a task instance to call
the
>>>>>>> >poster.py. BTW,
>>>>>>>I 
>>>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>>>correct 
>>>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Thanks 
>>>>>>> >Luke 
>>>>>>> 
>>>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>>>> 
>>>>>>> Cheers! 
>>>>>>> 
>>>>>>> Chris 
>>>>>>> 
>>>>>>> > 
>>>>>>> >From: Mattmann, Chris A (3980)
>>>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> > 
>>>>>>> >Sent: 2014年10月25日
>>>>>>> > 13:34 
>>>>>>> >To: Zichuan Wang
>>>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>>>> >xiaoyanj@usc.edu
>>>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Please cc 
>>>>>>> >dev@oodt.apache.org <mailto:dev@oodt.apache.org> I
will reply in
>>>>>>>detail 
>>>>>>> >soon 
>>>>>>> > 
>>>>>>> >Sent from my iPhone
>>>>>>> 
>>>>>>> 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398) NASA
Jet
>>>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zichuanw@usc.edu>
>>>>>>>wrote: 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Could please also explain how I can crawl all JSON file name
under
>>>>>>> >a specific directory using CAS-PGE? I’ll work through this
example
>>>>>>> 
>>>>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>>>> >xam 
>>>>>>> p 
>>>>>>> >le, but it doesn’t mention anything about crawling, instead
it
>>>>>>> >manually set the Input files paths...
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer
Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>>>> >写道: 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >In assignment 2 specification I noticed that you mentioned
OODT
>>>>>>> >File Manager, but from my understanding, we are using ETLLib
>>>>>>>poster
>>>>>>> >which talks directly to Solr. So how can we use OODT File
Manager
>>>>>>> >in this assignment?
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer
Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>
>>>
>>>
>>>
>>>



Mime
View raw message