Return-Path: X-Original-To: apmail-oodt-dev-archive@www.apache.org Delivered-To: apmail-oodt-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70ADBD7B8 for ; Fri, 9 Nov 2012 23:05:59 +0000 (UTC) Received: (qmail 68038 invoked by uid 500); 9 Nov 2012 23:05:59 -0000 Delivered-To: apmail-oodt-dev-archive@oodt.apache.org Received: (qmail 68013 invoked by uid 500); 9 Nov 2012 23:05:59 -0000 Mailing-List: contact user-help@oodt.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@oodt.apache.org Delivered-To: mailing list user@oodt.apache.org Received: (qmail 68004 invoked by uid 99); 9 Nov 2012 23:05:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 23:05:59 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.149.139.109] (HELO mail.jpl.nasa.gov) (128.149.139.109) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 23:05:51 +0000 Received: from mail.jpl.nasa.gov (ap-ehub-sp02.jpl.nasa.gov [128.149.137.149]) by smtp.jpl.nasa.gov (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA9N5S9c019273 (using TLSv1/SSLv3 with cipher AES128-SHA (128 bits) verified NO) for ; Fri, 9 Nov 2012 15:05:30 -0800 Received: from AP-EMBX-SP30.RES.AD.JPL ([169.254.2.158]) by ap-ehub-sp02.RES.AD.JPL ([fe80::dd85:7b07:1e36:7e3c%15]) with mapi id 14.02.0318.001; Fri, 9 Nov 2012 15:05:29 -0800 From: "Verma, Rishi (388J)" To: "" Subject: Re: PushPull framework and custom met extraction Thread-Topic: PushPull framework and custom met extraction Thread-Index: AQHNvh94iamNOebkREWbTUNGbP4pbpfiVFqAgAAQOACAABD+gIAAMbgA Date: Fri, 9 Nov 2012 23:05:28 +0000 Message-ID: References: <8f8f2660-b249-48f2-afd5-b2ce7d510781@me.com> In-Reply-To: <8f8f2660-b249-48f2-afd5-b2ce7d510781@me.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [128.149.137.114] Content-Type: multipart/alternative; boundary="_000_A4467F455F9047468335B76313E6DD92127D33apembxsp30RESADJP_" MIME-Version: 1.0 X-Source-Sender: Rishi.Verma@jpl.nasa.gov X-AUTH: Authorized X-Virus-Checked: Checked by ClamAV on apache.org --_000_A4467F455F9047468335B76313E6DD92127D33apembxsp30RESADJP_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hey Brian, That sounds pretty reasonable. Thanks for your help on this! rishi On Nov 9, 2012, at 12:07 PM, Brian Foster wrote: Hey Rishi, The filemgr connection from the pushpull is just to verify if the filemgr a= lready has a file, so the pushpull doesn't redownload files (no ingest supp= ort)... usually you configure your pushpull deamon to run at longer interva= l times, but the crawler usually will wake up more often (every 30 seconds = is a typical interval time for it)... so just have the pushpull download it= s files to a staging area which is the same directory which the crawler is = monitoring. -brian On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" > wrote: Hey Brian, Shreyl, Thanks for your input and clarification on this. Brian - the delegation of duties you described makes sense. Does cas-puspul= l have any way to invoke a local crawl process following completion of down= loads? I know it has a filemgr hookup, but I wonder about whether a crawl p= rocess can be invoked following the completion of all file downloads via pu= shpull. The alternative way of doing this could, of course, be to schedule = the crawler deamon to run well after the pushpull deamon finishes its work. Thanks to both of you for your help! rishi On Nov 9, 2012, at 10:08 AM, Brian Foster wrote: Hey Rishi, You will need to use both cas-pushpull and cas-crawler to accomplish this..= . cas-pushpull: Used to for downloading files from remote sites to you local = systems... the .tmp files contain cas-pushpull's known metadata and you can= configure which of the known metadata gets written out or if a .tmp file g= ets created at all... however you can add custom metadata fields to it. cas-crawler: Allows for metadata extraction (custom metadata) from files on= your local system... and then allows you to ingest them into the filemgr (= optionally can be turned off) HTH -brian On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" > wrote: Hi All - I'm wondering if anyone has experience with, or knows the details of how to= use custom MetExtractors on products that are remotely downloaded via Push= Pull. By default, PushPull performs some basic met-extraction and creates a ".tmp= " file associated with downloaded products, but I'm wondering whether this = met generation step is customizable. I've looked through the configuration files (e.g. [1], [2]) as well as the = code for PushPull, but I can't seem to locate configuration parameters to s= upport the invocation of custom met extractors on downloaded data. If any of you have experience with this, or can point me on where to look, = I'd really appreciate it. Thanks! Rishi -- [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/= push_pull_framework.properties [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/= examples/ --_000_A4467F455F9047468335B76313E6DD92127D33apembxsp30RESADJP_ Content-Type: text/html; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable Hey Brian,

That sounds pretty reasonable. Thanks for your help on this!

rishi

On Nov 9, 2012, at 12:07 PM, Brian Foster wrote:

Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr a= lready has a file, so the pushpull doesn't redownload files (no ingest supp= ort)... usually you configure your pushpull deamon to run at longer interva= l times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for i= t)... so just have the pushpull download its files to a staging area which = is the same directory which the crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Rishi.Verma@jpl.nasa.gov> wrote= :

Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-p= uspull have any way to invoke a local crawl process following completion of= downloads? I know it has a filemgr hookup, but I wonder about whether a cr= awl process can be invoked following the completion of all file downloads via pushpull. The alternative way of = doing this could, of course, be to schedule the crawler deamon to run well = after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this..= .

cas-pushpull: Used to for downloading files from remote sites to you local = systems... the .tmp files contain cas-pushpull's known metadata and you can= configure which of the known metadata gets written out or if a .tmp file g= ets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on= your local system... and then allows you to ingest them into the filemgr (= optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Rishi.Verma@jpl.nasa.gov> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of h= ow to use custom MetExtractors on products that are remotely downloaded via= PushPull. 

By default, PushPull performs some basic met-extraction and creates a = ".tmp" file associated with downloaded products, but I'm wonderin= g whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as= the code for PushPull, but I can't seem to locate configuration parameters= to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to l= ook, I'd really appreciate it.

Thanks! 
Rishi 

--


--_000_A4467F455F9047468335B76313E6DD92127D33apembxsp30RESADJP_--