Return-Path: X-Original-To: apmail-oodt-dev-archive@www.apache.org Delivered-To: apmail-oodt-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4127A9BD7 for ; Wed, 25 Apr 2012 23:36:40 +0000 (UTC) Received: (qmail 78856 invoked by uid 500); 25 Apr 2012 23:36:39 -0000 Delivered-To: apmail-oodt-dev-archive@oodt.apache.org Received: (qmail 78240 invoked by uid 500); 25 Apr 2012 23:36:38 -0000 Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@oodt.apache.org Delivered-To: mailing list dev@oodt.apache.org Received: (qmail 77802 invoked by uid 99); 25 Apr 2012 23:36:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 23:36:38 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.149.139.109] (HELO mail.jpl.nasa.gov) (128.149.139.109) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 23:36:28 +0000 Received: from mail.jpl.nasa.gov (ap-ehub-sp01.jpl.nasa.gov [128.149.137.148]) by smtp.jpl.nasa.gov (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id q3PNa2L2027453 (using TLSv1/SSLv3 with cipher AES128-SHA (128 bits) verified NO) for ; Wed, 25 Apr 2012 16:36:03 -0700 Received: from AP-EMBX-SP10.RES.AD.JPL ([169.254.1.229]) by ap-ehub-sp01.RES.AD.JPL ([169.254.3.238]) with mapi id 14.01.0355.002; Wed, 25 Apr 2012 16:36:01 -0700 From: "Verma, Rishi (388J)" To: "dev@oodt.apache.org" Subject: Registering a custom ProductCrawler with cas-crawler Thread-Topic: Registering a custom ProductCrawler with cas-crawler Thread-Index: AQHNIzwrBfRxKieo8ESikjYra6xKYw== Date: Wed, 25 Apr 2012 23:36:00 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.13.0.110805 x-originating-ip: [128.149.137.113] Content-Type: multipart/alternative; boundary="_000_CBBDD86EFA12RishiVermajplnasagov_" MIME-Version: 1.0 X-Source-Sender: Rishi.Verma@jpl.nasa.gov X-AUTH: Authorized --_000_CBBDD86EFA12RishiVermajplnasagov_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi all, I wrote a custom cas-crawler ProductCrawler, but I'm having some difficulty= registering my custom product crawler with cas-crawler. I created a product crawler by extending StdProductCrawler, and I've added = this product-crawler name to crawler config files (following the example of= StdProductCrawler): * crawler/policy/crawler-beans.xml * crawler/policy/cmd-line-option-beans.xml However, after running the below command, I can clearly see my custom produ= ct crawler (called LabCASProductCrawler) is not available. A crawler ingest= try also tells me that there is no "bean" by the name of my "LabCASProduct= Crawler" available: > bash-3.2$ ./crawler_launcher =97printSupportedCrawlers ProductCrawlers: Id: StdProductCrawler Id: MetExtractorProductCrawler Id: AutoDetectProductCrawler > ./crawler_launcher --crawlerId LabCASProductCrawler --filemgrUrl http://l= ocalhost:9000 --productPath /data/staging/HGHAGA9 --failureDir /tmp/failed_= ingest --metFileExtension met =97clientTransferer org.apache.oodt.cas.filem= gr.datatransfer.LocalDataTransferFactory Failed to parse options : No bean named 'LabCASProductCrawler' is defined I noticed in files like crawler-config.xml and cmd-line-option-beans.xml, t= here were references made to crawler config files stored in the cas-crawler= JAR. Looking more into this, it seems to me that crawler is pre-loading co= nfig files directly from that JAR and overshadowing any of my config change= s: * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-beans.x= ml * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-config.= xml So two questions: 1. Am I editing the correct policy files, in order to register my custom pr= oduct crawler with cas-crawler? 2. It seems the cas-crawler JAR contains crawler config files that take gre= ater precedence than the ones available for editing under crawler/policy. I= s there a way around this? Thanks! rishi --_000_CBBDD86EFA12RishiVermajplnasagov_--