oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Verma, Rishi (388J)" <Rishi.Ve...@jpl.nasa.gov>
Subject Re: Registering a custom ProductCrawler with cas-crawler
Date Thu, 26 Apr 2012 20:25:56 GMT
Per Chris' suggestion, I'm looking at making a custom pre-ingest action or
pre-ingest comparator instead of creating a full new productcrawler. This
might be a more light-weight solution.

However, thanks for the tips in any case Brian and Chris!

rishi

On 4/26/12 2:06 AM, "Brian Foster" <holenoter@me.com> wrote:

>Nevermind... Looks like you are using 0.3 instead of the trunk... what I
>added applies to trunk crawler
>
>-Brian
>
>On Apr 25, 2012, at 4:36 PM, "Verma, Rishi (388J)"
><Rishi.Verma@jpl.nasa.gov> wrote:
>
>> Hi all,
>> 
>> I wrote a custom cas-crawler ProductCrawler, but I'm having some
>>difficulty registering my custom product crawler with cas-crawler.
>> 
>> I created a product crawler by extending StdProductCrawler, and I've
>>added this product-crawler name to crawler config files (following the
>>example of StdProductCrawler):
>> * crawler/policy/crawler-beans.xml
>> * crawler/policy/cmd-line-option-beans.xml
>> 
>> However, after running the below command, I can clearly see my custom
>>product crawler (called LabCASProductCrawler) is not available. A
>>crawler ingest try also tells me that there is no "bean" by the name of
>>my "LabCASProductCrawler" available:
>>> bash-3.2$ ./crawler_launcher ‹printSupportedCrawlers
>> ProductCrawlers:
>>  Id: StdProductCrawler
>>  Id: MetExtractorProductCrawler
>>  Id: AutoDetectProductCrawler
>> 
>>> ./crawler_launcher --crawlerId LabCASProductCrawler --filemgrUrl
>>>http://localhost:9000 --productPath /data/staging/HGHAGA9 --failureDir
>>>/tmp/failed_ingest --metFileExtension met ‹clientTransferer
>>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>> Failed to parse options : No bean named 'LabCASProductCrawler' is
>>defined
>> 
>> I noticed in files like crawler-config.xml and
>>cmd-line-option-beans.xml, there were references made to crawler config
>>files stored in the cas-crawler JAR. Looking more into this, it seems to
>>me that crawler is pre-loading config files directly from that JAR and
>>overshadowing any of my config changes:
>> * 
>>crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-beans.x
>>ml
>> * 
>>crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-config.
>>xml
>> 
>> So two questions:
>> 1. Am I editing the correct policy files, in order to register my
>>custom product crawler with cas-crawler?
>> 2. It seems the cas-crawler JAR contains crawler config files that take
>>greater precedence than the ones available for editing under
>>crawler/policy. Is there a way around this?
>> 
>> Thanks!
>> rishi


Mime
View raw message