oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Question about metadata extraction and file manager
Date Wed, 30 Mar 2011 20:55:23 GMT
Hi Luca,

Thanks for your questions! Answers inline below:

>        I have a question about how to use the metadata extracted by an implementation
of "FilemgrMetExtractor" when versioning a file that is ingested by the File Manager.
> 
> I have configured my File Manager to run a custom metadata extractor ("NetCDFMetExtractor")
and a custom versioner ("DRSVersioner"). I am running the filemgr-client tool to ingest a
netcdf file, please see the log below.
> 
> o First question: why is the versioner run twice ? It seems like the first time it is
run, it has access to all the metadata that has been previously extracted by the NetCDFMetExtractor,
but the second time it doesn't ?


What does your NetCDFMetExtractor do? Does it call the DRSVersioner? How are you wiring the
2 together? I see from your command line below you are attaching these to the GenericFile
product type? How did you set that up? Can I see an example of your policy files? That will
help to diagnose what you're seeing.

> 
> o Second question: what is the relation between the metadata extracted by the NetCDFMetExtractor
and the argument to the --metadataFile option ? Is there any way to serialize the output of
the metadata extractor to a file that is then ingested by the file manager ?

The policy files will help with this. My guess is that you made the NetCDFExtractor a *server
side* met extractor. You have the ability to do *client side* or *server side* extraction.
From the FM client side, that's the extracted client side metadata, pre-baked in (via the
--metadataFile param), or generated on the fly from a o.a.oodt.cas.metadata.extractors.CmdLineMetExtractor
implementation, and piped in via the StdIngester, or via the crawler (AutoDetect, or MetExtractor).

Server side met is *derived after* the original client side met is sent along during the process
of ingestion.

HTH,
CHris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message