oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Verma, Rishi (388J)" <Rishi.Ve...@jpl.nasa.gov>
Subject Re: PushPull framework and custom met extraction
Date Fri, 09 Nov 2012 23:05:28 GMT
Hey Brian,

That sounds pretty reasonable. Thanks for your help on this!


On Nov 9, 2012, at 12:07 PM, Brian Foster wrote:

Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr already has a file,
so the pushpull doesn't redownload files (no ingest support)... usually you configure your
pushpull deamon to run at longer interval times, but the crawler usually will wake up more
often (every 30 seconds is a typical interval time for it)... so just have the pushpull download
its files to a staging area which is the same directory which the crawler is monitoring.


On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Rishi.Verma@jpl.nasa.gov<mailto:Rishi.Verma@jpl.nasa.gov>>

Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull have any way
to invoke a local crawl process following completion of downloads? I know it has a filemgr
hookup, but I wonder about whether a crawl process can be invoked following the completion
of all file downloads via pushpull. The alternative way of doing this could, of course, be
to schedule the crawler deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:

Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the
.tmp files contain cas-pushpull's known metadata and you can configure which of the known
metadata gets written out or if a .tmp file gets created at all... however you can add custom
metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system...
and then allows you to ingest them into the filemgr (optionally can be turned off)


On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Rishi.Verma@jpl.nasa.gov<mailto:Rishi.Verma@jpl.nasa.gov>>

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors
on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated
with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull,
but I can't seem to locate configuration parameters to support the invocation of custom met
extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate


[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/

View raw message