oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3010)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: OODT Help
Date Tue, 04 Apr 2017 01:57:29 GMT
Also this presentation is useful to understand the OODT Workflow Manager:

https://www.slideshare.net/chrismattmann/wengines-workflows-and-2-years-of-advanced-data-processing-in-apache-oodt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 4/3/17, 6:51 PM, "Mattmann, Chris A (3010)" <chris.a.mattmann@jpl.nasa.gov> wrote:

    Hi Keith,
    
    Thanks for contacting us. Yes this is precisely the type of thing that
    OODT can help you with.
    
    As a start, I would recommend reading this guide that shows you
    how to use the algorithm wrapper, CAS-PGE. You can build a workflow
    of several of these wrappers to push out your production pipeline:
    
    https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
    
    In addition to the above guide, I would start with installing OODT RADIX, the
    quick installer:
    
    https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT
    
    Once RADIX is installed, then edit your CAS-PGE algorithm wrappers and write
    some config files. Then test out your production pipeline. If you run into trouble
    with your CAS-PGE here’s an FAQ:
    
    https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Help+and+Documentation
    
    If you want to understand more about how metadata flows in the system, you can check
    this out:
    
    https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+Metadata+during+PGE+based+Processing
    
    and this:
    
    https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence
    
    Finally there are two examples of full-up OODT pipelines/deployments. The first is DRAT,
which does
    large scale code license analysis via OODT map reduce (there is a paper in the GitHub
repo you can check out):
    
    http://github.com/chrismattmann/drat/
    
    The second is Big Translate, a large scale Map Reduce machine translation pipeline, is
here:
    
    http://github.com/chrismattmann/bigtranslate/
    
    Cheers and if we can help more let us know.
    
    Cheers,
    Chris
    
    
    
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Chris Mattmann, Ph.D.
    Principal Data Scientist, Engineering Administrative Office (3010)
    Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
    NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    Office: 180-503E, Mailstop: 180-503
    Email: chris.a.mattmann@nasa.gov
    WWW:  http://sunset.usc.edu/~mattmann/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Director, Information Retrieval and Data Science Group (IRDS)
    Adjunct Associate Professor, Computer Science Department
    University of Southern California, Los Angeles, CA 90089 USA
    WWW: http://irds.usc.edu/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     
    
    On 4/3/17, 6:36 PM, "Keith Bannister" <keith.bannister@csiro.au> wrote:
    
        Hi,
        
        I'm trying to work out whether OODT is the right framework for me.
        
        I have a radio astronomy application. Data rate is roughly 12 TB/day. 
        Data format it a custom one with all sorts of metadata flying around 
        (including sky direction in lat/long coordinates).
        
        The raw data is pretty huge, and I can't store it on an OODT machine. 
        The big disk I have access to won't run OODT>
        
        Basically I want to:
        
        1. Save the metadata of the raw data into an index somewhere.
        2. Run some GPU codes over the raw data. The GPU code parameters should 
        be set based on the metadata.
        3. Save the GPU results in an archive, with even more metadata
        4. Copy the raw data to a remote disk with a long-running bbcp  task.
        5. Delete the raw data, but keep the GPU results and all the metadata
        
        I'm having trouble finding the right documentation the describes how I 
        can do this. Can you give me a top level page? (I've looked at the wiki, 
        but it's a bit tricky to work out where to start).
        
        K
        
        
        -- 
        KEITH BANNISTER | Principal Research Engineer
        CSIRO Astronomy and Space Science
        T +61 2 9372 4295
        E keith.bannister@csiro.au
        
    
    

Mime
View raw message