oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: re: Question about OODT file manager
Date Mon, 27 Oct 2014 05:18:15 GMT
Hi Luke,

Thanks and sorry it’s taken me a while to reply. Here
are some details below:


-----Original Message-----
From: Luke <shuailiu@usc.edu>
Date: Sunday, October 26, 2014 at 6:19 PM
To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>, 'Zichuan Wang'
<zichuanw@usc.edu>
Cc: Chris Mattmann <mattmann@usc.edu>, "zhoujian@usc.edu"
<zhoujian@usc.edu>, "xiaoyanj@usc.edu" <xiaoyanj@usc.edu>,
"dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: RE: re: Question about OODT file manager

>Hi Professor Mattmann and OODT DEV,
> 
>Sorry to trouble you with this email, our team has been struggling in the
>oodt to send json files to solr.
>One of the difficulties is still getting OODT workflow to call the
>poster.py in etllib.

Sorry that you’re having difficulty let me try and help.

> 
>I am not sure if my understanding is correct with OODT requirement, I
>hope you can please kindly advice and help with our confusion.
> 
>a set of goals in my mind with OODT is as follows, please kindly confirm
>and clarify:
> 
>1)      
>Get the File-Manager up and running.

Yep, hopefully as installed via OODT RADIX.

>2)      
>send all json files with command wmgr-client to the fileManager server.
>(I believe we can achieve it with a bash script or probably
> python that calls the command line sequentially with each json file name
>as an argument?!)

Suggestion:

1. Use the OODT crawler and file manager to crawl/index the JSON files (in
place data transfer).
2. Take a look at CAS-PGE, it will help you write a workflow task that
will wrap ETLlib and the poster command.
3. Once you are confident with #2, whip up a script that pages through all
of your indexed JSON files,
and then for each one, submits a workflow event (you may need to look into
aggregating them) that
calls your CAS-PGE wrapped poster task from ETLlib.

>3)      
>Once we have json files sent and stored in the File-Manager, we need to
>get workflow-manager up and running, and we can create a workflow
> that send those jsons file from the file manager to solr.

See above.

>4)      
>Create a workflow according to
>Workflow2 User Guide
><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
>>>>>>>>>>> here comes the problem…..
>         I am not sure how to create a workflow task which can call the
>poster.py in python etllib, it looks like we need to create our own java
> class that extend <TaskInstance> which is an abstract Java class with
>one abstract method that has the following signature:
>
>                  
>protectedabstract ResultsState performExecution(ControlMetadata
>crtlMetadata);
>         However, the detail of where to find the corresponding libs and
>where to put our implementation in workflow manager is being neglected
> in that page.  I am not sure if we should use TaskInstance, but it seems
>the workflow has to have an interface thru which it can call the python
>code i.e. poster.py. and it looks like we need to embody the
>TaskInstance::performExecution by injecting the code
> that calls the poster.py and return the resultState.
> 
> 
>It would be greatly appreciated if you could please shed some lights and
>advice how we can get a task instance to call the poster.py. BTW, I am
> also not sure if my understanding is correct, please kindly correct it
>if inappropriate. Your help will be appreciated as usual.
> 
> 
> 
>Thanks
>Luke

Thanks Luke, see above. Let me know if it helps.

Cheers!

Chris

> 
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>
>Sent: 2014年10月25日
> 13:34
>To: Zichuan Wang
>Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu; xiaoyanj@usc.edu
>Subject: Re: 回复: Question about OODT file manager
>
>
> 
>Please cc 
>dev@oodt.apache.org <mailto:dev@oodt.apache.org> I will reply in detail
>soon 
>
>Sent from my iPhone


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






>
>
>On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zichuanw@usc.edu> wrote:
>
>
>Dear Professor,
>
> 
>
>Could please also explain how I can crawl all JSON file name under a
>specific directory using CAS-PGE? I’ll work through this example
>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example,
> but it doesn’t mention anything about crawling, instead it manually set
>the Input files paths...
>
>
> 
>
>-- 
>
>Zichuan Wang
>
>University of Southern California, Department of Computer Science
>
> 
>
>
>在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>写道:
>
>Dear Professor, 
>
> 
>
>In assignment 2 specification I noticed that you mentioned OODT File
>Manager, but from my understanding, we are using ETLLib poster which
>talks directly to Solr. So how can we use OODT File Manager in this
>assignment?
>
> 
>
>-- 
>
>Zichuan Wang
>
>University of Southern California, Department of Computer Science
>
>
>
>
>
>
> 
>
>
>
>
>
>

Mime
View raw message