oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Re: Datastore references for a duplicate products
Date Wed, 22 Apr 2015 13:50:14 GMT
As a variant, what happens if you discover that several of
the files have been corrupted (by errors in software or by
hardware malfunction during the write process)?  If the files
are part of a series (such as a time series) where missing
data could cause a serious increase in uncertainty, do you
replace the erroneous members of the series - and if so,
how do you identify the replacement values?  You should
also consider how you deal with redoing files where the
data that was corrupted was used to create other files in a
complex workflow.

Incidentally, we had a case where a router hardware failure
corrupted a large number of transmitted files.  Those had
to be replaced.  The replacement files then had to be input
to a fairly complex production workflow, followed by reinsertion
into the data stores we were using.  Not fun!

Some groups concerned with long-term archival don't
permit deletion of data - even if erroneous.  However,
the users really need a homogeneous data record with
as few gaps as possible.  They need to be informed of
the availability of the replacements.

Along this line, in one case we had built a very stringent
consistency check that time always increased in both
inputs to a process that had to merge data sources.
Specifically, data from our instruments came in one
data stream; data on the satellite position (ephemeris)
came in another; data on the satellite attitude came in
a third.  When we got data, one of the sources had put
time-reversed "tape recorder" data on one of the input
files - for about a two week period.  Trying to rewrite the
error checking would have been so complicated that
we just dropped that two-weeks of data and had to live
with the gap.  Personally, I was glad the error checking
worked as intended.

Bruce B.

On Wed, Apr 22, 2015 at 8:29 AM, Bruce Barkstrom <brbarkstrom@gmail.com>

> What happens to references to duplicate files stored in an online backup
> directory, as well as ones stored in a remote backup
> location?  In more complicated versions of this question,
> how would a federated archive handle replicas stored
> in other archives?
> Then, would it matter if the archive decided to do a slight
> reformat of the data that merely rearranged the numerical
> values without adding or deleting any of them?
> Bruce B.
> On Wed, Apr 22, 2015 at 6:09 AM, Thomas Bennett <lmzxq.tom@gmail.com>
> wrote:
>> Hi,
>> Okay - so a 'flat' product has a single datastore reference.
>> So, how do you handle redundant copies of products? At the moment I have
>> an
>> independent catalogue at each site.
>> I was thinking of a site metadata key, so multiple products can be
>> filtered, but I thought I would see what other people are doing and if
>> there is any interest in perhaps getting a product (flat or heirachical)
>> with multiple references, i.e. beyond originalReference and
>> datastoreReference.
>> Or does that totally break the OODT model? It probably does...
>> Also - does anyone store data on tape library and index it with OODT. I'm
>> talking basic tar on a tape. This obviously breaks the file retrieval, if
>> used, but I'm thinking of how this could be included in the OODT framework
>> and maybe develop some methods.
>> But before I go to deep I thought I would ask.
>> Cheers,
>> Tom

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message