oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject BigTranslate: another Apache OODT-based app
Date Mon, 04 Jul 2016 03:22:32 GMT
Hey Folks,

If you want another (like DRAT [1]) turnkey Apache OODT application,
take a look at BigTranslate [2]. I’ve given it a full makeover. There
are some lingering things I want to do like Docker and making some 
things a bit easier to take care of, but it’s pretty much done, and
churning again translating thew 190M row DARPA XDATA employment data
right now.

I welcome any and all contributions. A few things to note out of

1. BigTranslate inspired by 2 blog / wiki posts on understanding
Apache OODT metadata especially during pipeline processing:


These are useful posts and if we are doing a website redesign
should be emphasized as they help to really understand what’s going
on during large scale Apache OODT processing.

2. There are a bunch of TODO and to be filed issues for Apache 
OODT that I found while fixing and productionizing BigTranslate.
In no specific order they are:

* change PathUtils#getEnv to use System.getEnv
* change PathUtils#getEnv to be static and only load the properties 1 time per JVM
* investigate cas-pge valueless key with workflowMet should push into workflow met 
with existing value and not look for key-ref
* update cas-crawler say what preconditions failed on crawling
* create better error messages when crawler actions fail
* radix query tool path needs better deployment
* sortBy in Query tool is broke b/c of unsupported operation exception

I’ll be filing the above issues and fixing them in 1.x branch and 
2.x going forward over the next week.

Comments and improvements welcomed in BigTranslate! Also maybe
we should make a wiki page that lists our full end to end, usable
apps like BigTranslate and DRAT.


[1] http://github.com/chrismattmann/drat/
[2] http://github.com/chrismattmann/bigtranslate/

Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/

View raw message