oodt-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: MetadataBasedFileVersioner and custom metExtractor
Date Sat, 05 Nov 2011 04:30:36 GMT
Hi Ricky,

On Nov 3, 2011, at 2:04 PM, Nguyen, Ricky wrote:

> Cool. Thanks Chris. Briefly, what's the reasoning behind this design decision?

No problem. The rationale mainly has to do with lifecycle of the Metadata object itself, and
with the notion of versioning. Versioning is really a process to generate the final data store

references, and is used in conjunction with data transferring as part of the archiving process.

So intrinsically, versioning is co-located with wherever the data transfer occurs. 

In addition, there are no guarantees for write back from the versioner and its copy of the

Metadata object provided to it. That's because the metadata object is read-only at that point
in 
time. This is done to reduce scoping, and to ensure that the only pieces of code that can
modify 
that metadata object are:

1. client-side metadata extractors
2. server-side metadata extractors

This is mainly due to the desire to localize changes to metadata in order to eventually persist

the metadata in the catalog.

The above is depicted pictorially here, in the final Use Case section:

http://oodt.apache.org/components/maven/filemgr/development/developer.html

> 
> Also, when developing custom metExtractors, what factors go into the decision whether
to use client-side vs server-side extraction for a particular ProductType?

Great question. Client-side metadata extractors plug into e.g., the Curator, to the Crawler,
etc., These are particularly useful for stand-alone 
types of extraction processes. Server side metadata extractors are useful for derived metadata;
for catch-all situations where you want to make 
sure certain fields are filled, and where you want to co-locate metadata extraction (and associated
library dependencies) with the file manager
server.

HTH!

Cheers,
Chris

> On Nov 3, 2011, at 12:54 PM, Mattmann, Chris A (388J) wrote:
> 
>> On Nov 3, 2011, at 2:44 AM, Nguyen, Ricky wrote:
>> 
>>> in short:
>>> (1) client-side metExtractor + versioner = all client-extracted met is available
to the versioner
>>> (2) server-side metExtractor + versioner = server-extracted met is NOT available
to the versioner (unless, as Chris suggested, versioner re-runs server-side metExtractor)
>>> 
>>> Is (2) expected behavior?
>> 
>> Yep, sure is. 
>> 
>> Cheers,
>> Chris
>> 
>>> -Ricky
>>> 
>>> On Nov 2, 2011, at 10:16 PM, Mattmann, Chris A (388J) wrote:
>>> 
>>>> Hi Ricky,
>>>> 
>>>> You're running into the issue of where/when Versioning is done. 
>>>> 
>>>> Right now you are using a server-side met extractor -- that metadata is extracted
on the server side, cataloged, 
>>>> but is _not_ passed back to the client, for use in client-side versioning
(which I'm guessing you're using). 
>>>> 
>>>> One way around this is to take an approach similar to the FinalFileLocationExtractor
-- that is: make your 
>>>> versioner run the server side met extractor as part of its versioning process
to derive the same metadata 
>>>> that you want used for versioning. Or, alternatively, bake in somehow (to
the metadata stream that you 
>>>> use in read-only form in the versioner) the field that you are interested
in flowing through.
>>>> 
>>>> HTH!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On Nov 2, 2011, at 4:20 PM, Nguyen, Ricky wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> My MetadataBasedFileVersioner can't see the met produced by my custom
metExtractor
>>>>> 
>>>>> I've read OODT-72. That issue describes using the Versioner's calculated
Reference to assign Metadata (ver -> met). My issue is the opposite direction, using extracted
Metadata in the Versioner's Reference calculation.
>>>>> 
>>>>> For example, suppose my metExtractor assigns a value to the "MRN" element.
Then I want my versioner to create a datastore reference at "/[MRN]/[Filename]".
>>>>> 
>>>>> My product-types.xml (abbreviated):
>>>>> <type name="CustomProdType"/>
>>>>> <versioner class="CustomMetBasedFileVersioner"/>
>>>>> <extractor class="CoreMetExtractor"/>
>>>>> <extractor class="MimeTypeExtractor"/>
>>>>> <extractor class="MRNExtractor"/>
>>>>> <extractor class="FinalFileLocationExtractor"/>
>>>>> </type>
>>>>> 
>>>>> After I ingest the file, I dump the met (using MetadataDumper) and the
product (using ProductDumper). The met looks fine:
>>>>> <key>FileLocation</key>
>>>>> <val>%2FUsers%2Frnguyen%2Fvpicu%2Fdata%2Farchive%2FMRN_1010209</val>
>>>>> 
>>>>> But the product reference doesn't:
>>>>> <reference dataStore="file:/Users/rnguyen/vpicu/data/archive/MRN_null/null"
orig="file:///Users/rnguyen/vpicu/components/filemgr/policy/cerner/vps_demog.csv" size="1114427"/>
>>>>> 
>>>>> Is this an issue? Or am I not using the components correctly? Is there
a better way to achieve what I want?
>>>>> 
>>>>> Thanks,
>>>>> Ricky
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,

>>>>> is for the sole use of the intended recipient(s) and may contain confidential
>>>>> or legally privileged information. Any unauthorized review, use, disclosure
>>>>> or distribution is prohibited. If you are not the intended recipient,
please
>>>>> contact the sender by reply e-mail and destroy all copies of this original
message.  
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> 
>>>> 
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>>> is for the sole use of the intended recipient(s) and may contain confidential
>>> or legally privileged information. Any unauthorized review, use, disclosure
>>> or distribution is prohibited. If you are not the intended recipient, please
>>> contact the sender by reply e-mail and destroy all copies of this original message.
 
>>> 
>>> ---------------------------------------------------------------------
>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
> is for the sole use of the intended recipient(s) and may contain confidential
> or legally privileged information. Any unauthorized review, use, disclosure
> or distribution is prohibited. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of this original message. 

> 
> ---------------------------------------------------------------------
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message