From holenoter <holeno...@mac.com>
Subject Re: cas-crawler does not pass preconditions
Date Wed, 01 Jun 2011 21:20:00 GMT
hey thomas,

you are using StdProductCrawler which assumes a *.met file already exist for each file (it
has only one precondition which is the existing of the *.met file) . . . if you want a *.met
file generated you will have to use one of the other 2 crawlers.  running: ./crawler_launcher
-psc will give you a list of supported crawlers.  you can then run: ./crawler_launcher -h
-cid <crawler_id> where crawler id is one of the ids from the previous command . . .
unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler
will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping
file to be filled out an mime-types defined

* MetExtractorProductCrawler example configuration can be found in the source:
 - allows you to specify how the crawler will run your extractor

* AutoDetectProductCrawler example configuration can be found in the source:
 - uses the same metadata extractor specification file (you will have one of these for each
 - allows you to define your mime-types -- that is, give a mime-type for a given filename
regular expression

   - your file might look something like:
		<mime-type type="product/hdf5">
			<glob pattern="*.h5"/>
 - maps your mime-types to extractors

Hope this helps . . .

On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tbennett@ska.ac.za> wrote:


I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):

However, when I try launch the crawler I get a warning telling me the the preconditions for
ingest have not been met. No .met file has been created.

Two questions:
1) I'm just wondering if there is any configuration that I'm missing.
2) Where should I start hunting in the code or logs to find out why my met extractor was not

Kind regards,

For your reference, here is the command and output.

bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5
--filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir
Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]

