oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Data transfer questions
Date Mon, 19 Mar 2012 20:52:32 GMT
Hey Tom,

AWESOME. I smell Wiki page :)

Read on below:

On Mar 19, 2012, at 8:18 PM, Thomas Bennett wrote:

> Versioner schemes
> The Data Transferers have an acute coupling with the Versioner scheme, case in point:
if you are doing InPlaceTransfer,
> you need a versioner that will handle file paths that don't change from src to dest.
> The Versioner is used to describe who a target directory is created for a file to archive.
I.e a directory structure where the data will be place. So if I have an archive root at /var/kat/archive/data/
and I use a basic versioner it will archive a file called 1234567890.h5 at /var/kat/archive/data/1234567890.h5/1234567890.h5.
So this would describe the destination for a local data transfer. 
> I have the following versioner set in my policy/product-types.xml.
> policy/product-types.xml
> <versioner class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>

Ah, gotcha. You may consider using the MetadataBasedFileVersioner. It lets you define a filePathSpec,

e.g., /[PrincipalInvestigator]/[Project]/[AcquisitionDate]/[Filename]

And then versions or "places" the resulting product files in that specification structure.

To create the above, you would simply subclass the Versioner like so:

public KATVersioner extends MetadataBasedFileVersioner{
   String filePathSpec = "/[PrincipalInvestigator]/[Project]/[AcquisitionDate]/[Filename]";

   public KATVersioner(){

You can even refer to keys that don't exist yet, and then dynamically generate them (and their
values) by overriding the createDatStoreReferences method:

 public void createDataStoreReferences(Product product, Metadata met){
     // do work to generate AcquisitionDate here
     met.replaceMetadata("AcquisitionDate", acqdate);
     super.createDataStoreReferences(product, met);

> Just out of curiosity... why is this called a versioner?

Hehe, if it's weird in OODT, it most likely resulted from me :) I originally saw 
this as a great tool to "version" or allow for multiple copies of a file on disk, e.g., with
file (or directory-based) metadata to delineate the versioners. Over time it really grew to
be a
"URIGenerationScheme" or "ArchivePathGenerator". Those would be better names, but Versioner
stuck, so here we are :)

> Using the File Manager as the client
> Configuring a data trransfer in filemgr.properties, and then not using the crawler directly,
but e.g., using the XmlRpcFileManagerClient,directly,
> you can tell the server (on the ingest(...) method) to handle all the file transfers
for you. In that case, the server needs a
> Data Transferer configured, and the above properties apply, with the caveat that the
FM server is now the "client" that is transferring
> the data to itself :)
> If I set the following property in the etc/filemgr.property file 
> filemgr.datatransfer.factory=org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransfer
> I did a quick try of this today, trying an ingest on my localhost, (to avoid any sticky
network issues) and I was able to perform an ingest. 
> I see you can specify the data transfer factory to use, so I assume then that the filemgr.datatransfer.factory
setting is just the default if none is specified on the command line. Is this true?

It's true, if you are doing server-based transfers (by calling the filemgr-client --ingestProduct
method directory, without specifying the data transfer factory on the command line, 

> I ran a version of the command line client (my own version of filemgr-client with abs
paths to the configuration files):
> cas-filemgr-client.sh --url http://localhost:9101 --operation --ingestProduct --refs
/Users/thomas/1331871808.h5 --productStructure Flat --productTypeName KatFile --metadataFil/Users/thomas/1331871808.h5.met
--productName 1331871808.h5 --clientTransfer --dataTransfer org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory
> With the data factory also type spec'ed as:
> etc/filemgr.properties
> filemgr.datatransfer.factory=org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory
> And the versioner set as:
> policy/product-types.xml
> <versioner class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>
> And it ingested the file. +1 for OODT!


> Local and remote transfers to the same filemgr 
> One way to do this is to write a Facade java class, e.g., MultiTransferer, that can e.g.,
on a per-product type basis,
> decide whether to call and delegate to LocalDataTransfer or RemoteDataTransfer. If wrote
in a configurable way, that would be
> an awesome addition to the OODT code base. We could call it ProductTypeDelegatingDataTransfer.
> I'm thinking I would prefer to have some crawlers specifying how file should be transferred.
Is there any particular reason why this would not be a good idea - as long as the client specifies
the transfer method to use? 

Yeah this is totally acceptable -- you can simply tell the crawler which TransferFactory to
use. If you wanted the crawlers to sense it
automatically based on Product Type (which also has to be provided), then you could use a
method similar to the above.

> Getting the product to a second archive
> One way to do it is to simply stand up a file manager at the remote site and catalog,
and then do remote data transfer (and met transfer) to take care of that.
> Then as long as your XML-RPC ports are open both the data and metadata can be backed
up by simply doing the same ingestion mechanisms. You could
> wire that up as a Workflow task to run periodically, or as part of your std ingest pipeline
(e.g., a Crawler action that on postIngestSuccess backs up to the remote
> site by ingesting into the remote backup file manager).
> Okay. Got it! I'll see if I can wire up both options!


> I'd be happy to help you down either path.
> Thanks! Much appreciated.
> > I was thinking, perhaps using the functionality described in OODT-84 (Ability for
File Manager to stage an ingested Product to one of its clients) and then have a second crawler
on the backup archive which will then update it's own catalogue.
> +1, that would work too!
> Once again, thanks for the input and advice - always informative ;) 

Haha anytime dude. Great work!


Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

View raw message