oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: Running operations over data
Date Sun, 02 Mar 2014 07:18:49 GMT
Great reply Cam



-----Original Message-----
From: Cameron Goodale <sigep311@gmail.com>
Reply-To: "user@oodt.apache.org" <user@oodt.apache.org>
Date: Wednesday, February 26, 2014 10:32 PM
To: "user@oodt.apache.org" <user@oodt.apache.org>
Subject: Re: Running operations over data

>Hey Tom,
>
>
>TLDR - Crawler ships with some actions, but you can write your own
>actions, and those actions can be wired into PreIngestion or
>PostIngestion.  FileManager has MetExtractors that run before ingestion,
>they traditionally are meant to extract metadata (as
> the name implies) but you could just as easily have it run a checksum
>and store it in metadata, or convert an incoming file into PDF, then
>ingest the PDF.
>
>
>
>
>On the Snow Data System here at JPL we have a lights out operation that
>might be of interest, so I will try to explain it below.
>
>
>1.  Every hour OODT PushPull wakes up and tries to download new data from
>a Near Real Time Satellite Imagery service via FTP
>(http://lance-modis.eosdis.nasa.gov/)
>2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>staging area where PushPull downloads Satellite Images
>3.  When the crawler encounters files that have been downloaded and are
>ready for ingestion then things get interesting.  During the crawl
>several pre-conditions need to be met (the file cannot already be in the
>catalog - guarding against duplicates, the
> file has to be of the correct mime-type, etc..)
>4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>FileManager, but things don't stop here.
>5.  Crawler has a post-ingest success hook that we leverage and we use
>the "TriggerPostIngestWorkflow" action which automatically submits an
>event to workflow
>6.  OODT Workflow Manager receives the event (in this example it would be
>"MOD09GANRTIngest") and it boils that down into tasks that get run.
>7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>who farms the jobs off to Batchstubs that are running across 4 different
>machines.
>8.  When the jobs complete, crawler will ingest the final outputs back
>into the FileManager.
>
>
>Hope that helps.
>
>
>Best Regards,
>
>
>
>
>Cameron
>
>
>
>On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
><tom.barber@meteorite.bi> wrote:
>
>Hello folks,
>
>Preparing for this talk, so I figure I should probably work out how OODT
>works..... ;)
>
>Anyway I have some ideas as how to integrate some more non science like
>tools into OODT but I'm still figuring out some of the components.
>Namely, workflows.
>
>
>If for example, in OODT world I wanted to ingest a bunch of data and
>perform some operation on them, does this happen during the ingest phase,
>or post ingest?
>
>Normally you guys would write some crazy scientific stuff I guess to
>analyse the data you're ingesting and then dump it in some different
>format into the catalog, does that sound about right?
>
>Thanks
>
>Tom
>-- 
>Tom Barber | Technical Director
>
>meteorite bi
>T: 
>+44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>W: www.meteorite.bi <http://www.meteorite.bi> |
>Skype: meteorite.consulting
>A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>
>
>
>
>
>
>-- 
>
>Sent from a Tin Can attached to a String
>
>



Mime
View raw message