oodt-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nguyen, Ricky" <rngu...@chla.usc.edu>
Subject Re: MetadataBasedFileVersioner and custom metExtractor
Date Thu, 03 Nov 2011 06:44:43 GMT
Same issue found here:
http://mail-archives.apache.org/mod_mbox/oodt-user/201103.mbox/%3c3BD372C8-B2D6-4A67-8566-3C1EA93A2340@jpl.nasa.gov%3e

> o First question: why is the versioner run twice ?

FinalFileLocationExtractor runs the versioner during "met extraction" phase of ingestion,
but doesn't persist the datastore references. The second run is the actual "versioning" phase
with datastore persistence.

> It seems like the first time it is run,
> it has access to all the metadata that has been previously extracted by the NetCDFMetExtractor,
> but the second time it doesn't ?

Exactly what I'm seeing. This explains why FileLocation met is persisted (on 1st run), but
datastore reference is incorrect (on 2nd run).

I guess Luca ended up using client-side (crawler) met extractors in order to fill met elements
prior to ingestion.

in short:
(1) client-side metExtractor + versioner = all client-extracted met is available to the versioner
(2) server-side metExtractor + versioner = server-extracted met is NOT available to the versioner
(unless, as Chris suggested, versioner re-runs server-side metExtractor)

Is (2) expected behavior?
-Ricky

On Nov 2, 2011, at 10:16 PM, Mattmann, Chris A (388J) wrote:

> Hi Ricky,
> 
> You're running into the issue of where/when Versioning is done. 
> 
> Right now you are using a server-side met extractor -- that metadata is extracted on
the server side, cataloged, 
> but is _not_ passed back to the client, for use in client-side versioning (which I'm
guessing you're using). 
> 
> One way around this is to take an approach similar to the FinalFileLocationExtractor
-- that is: make your 
> versioner run the server side met extractor as part of its versioning process to derive
the same metadata 
> that you want used for versioning. Or, alternatively, bake in somehow (to the metadata
stream that you 
> use in read-only form in the versioner) the field that you are interested in flowing
through.
> 
> HTH!
> 
> Cheers,
> Chris
> 
> On Nov 2, 2011, at 4:20 PM, Nguyen, Ricky wrote:
> 
>> Hi,
>> 
>> My MetadataBasedFileVersioner can't see the met produced by my custom metExtractor
>> 
>> I've read OODT-72. That issue describes using the Versioner's calculated Reference
to assign Metadata (ver -> met). My issue is the opposite direction, using extracted Metadata
in the Versioner's Reference calculation.
>> 
>> For example, suppose my metExtractor assigns a value to the "MRN" element. Then I
want my versioner to create a datastore reference at "/[MRN]/[Filename]".
>> 
>> My product-types.xml (abbreviated):
>> <type name="CustomProdType"/>
>> <versioner class="CustomMetBasedFileVersioner"/>
>> <extractor class="CoreMetExtractor"/>
>> <extractor class="MimeTypeExtractor"/>
>> <extractor class="MRNExtractor"/>
>> <extractor class="FinalFileLocationExtractor"/>
>> </type>
>> 
>> After I ingest the file, I dump the met (using MetadataDumper) and the product (using
ProductDumper). The met looks fine:
>> <key>FileLocation</key>
>> <val>%2FUsers%2Frnguyen%2Fvpicu%2Fdata%2Farchive%2FMRN_1010209</val>
>> 
>> But the product reference doesn't:
>> <reference dataStore="file:/Users/rnguyen/vpicu/data/archive/MRN_null/null" orig="file:///Users/rnguyen/vpicu/components/filemgr/policy/cerner/vps_demog.csv"
size="1114427"/>
>> 
>> Is this an issue? Or am I not using the components correctly? Is there a better way
to achieve what I want?
>> 
>> Thanks,
>> Ricky
>> 
>> 
>> ---------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>> is for the sole use of the intended recipient(s) and may contain confidential
>> or legally privileged information. Any unauthorized review, use, disclosure
>> or distribution is prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of this original message.
 
>> 
>> ---------------------------------------------------------------------
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 




---------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain confidential
or legally privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this original message.  

---------------------------------------------------------------------


Mime
View raw message