oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Metadata based versioning
Date Sun, 01 Jun 2014 04:50:55 GMT
Hey Tom,

-----Original Message-----

From: Thomas Bennett <lmzxq.tom@gmail.com>
Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Date: Friday, May 30, 2014 6:26 AM
To: OODT <dev@oodt.apache.org>
Subject: Re: Metadata based versioning

>Hey Chris,
>Thanks for your reply.
>I think you may have clarified some of my understanding of oodt under the
>hood. Woot. (or is that woodt?)

Haha, woodt it is.

>Firstly, from OODT-72 I can see how the design decisions were made. It
>so happens that I'm wanting the filename for versioning and suddenly I
>understand why FinalFileLocationExtractor is needed. I will now use it
>confidence :-).
>Your use of the term 'client side data movement' confused me at first, so
>had to think about it a bit. I was always under the impression (a naive
>misconception) that if your file manager existed on a "machine B" you
>need to do a remote data transfer to use that file manager.
>But what you're saying is that the following setup is possible:
>   - Machine A (client): crawler + repository path + local data transfer
>   (i.e. machine A, or the 'client' does not need a file manager running
>   does not need to remote data transfer to the machine B)
>   - Machine B (server): file manager (does not need the repository path
>   archive files)

Yes this is totally possible. Imagine the following configuration:

Machine A: no file manager, but has crawler, + can see src + dest path with
local data transfer (note *local* is a misnomer, b/c through distributed
systems like NFS, Hadoop, Spark/Shark, GlusterFS, etc. we can logically
local commodity shared nothing disk and federate them to make them appear
one big one - each of the preceding distributed file system technologies
all have
different strengths benefits, but from OODT's perspective, it can all be
even if it truly isn't).

Machine B: file manager

Use Case:

Ingest a file on machine A into the File Manager on machine B.
  - totally doable
  - crawler on A contacts (by default) http://B:9000/ and then ingests
into file manager using client side transfer.

Make sense?

>Have I got the right idea?



Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-5th floor
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

>On 30 May 2014 07:01, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>> Hey Tom,
>> You've correctly discovered this. This was an intentional by-design
>> artifact of my belief that versioning and data movement should be
>> sort of co-located on the same machine. So if you do client side
>> data movement (which most people do), then the versioning should
>> happen alongside of it, and thus any metadata extraction present
>> there should be available during versioning for use in e.g., Metadata
>> based versioning.
>> The rub comes in the issue where the metadata is generated on the
>> server side and you expect versioning to be available to the system.
>> One way of getting around this is taking a look at the way that
>> the FinalFileLocationExtractor [1] grabs the latest version of the
>> CoreMetKeys.FILE_LOCATION property and then makes it available for e.g.,
>> versioning.
>> See discussion too in OODT-72 [2] for some rationale behind my
>> sentiments there. Happy to discuss!
>> Cheers,
>> Chris
>> [1] http://s.apache.org/bvd
>> [2] https://issues.apache.org/jira/browse/OODT-72

View raw message