oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Re: [jira] [Commented] (OODT-551) DataSourceCatalog implementation does not preserve order of metadata values
Date Sun, 20 Jan 2013 17:56:23 GMT
There is an interesting and fundamental dichotomy between identifier
schemas intended
solely for use with relational databases, where the primary keys have no
intrinsic order
(like UUIDs and related cryptographic digests) and identifier schemas that
may be used
more by humans.  In terms of the Open Archive Information System Reference
Model,
this could be stated in terms of whether the Designated Community for the
Archive
includes primarily human user communities or may include machines
performing tasks
authorized by humans.  Ones involving primarily humans may well include
sequencing
information and the identifiers may contain information semantically usable
by humans,
although perhaps not by machines running typical software.

There are a number of problems that arise when the archive contains objects
that
are composed (or aggregated) from other objects - and in which human beings
may be involved in producing or interpreting the identifiers.

It's an interesting problem.  For OODT, it might be a good idea to be
prepared for
both kinds of possibilities - as the JPL example shows.

Bruce B.

On Sun, Jan 20, 2013 at 12:34 PM, Chris A. Mattmann (JIRA)
<jira@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/OODT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558323#comment-13558323]
>
> Chris A. Mattmann commented on OODT-551:
> ----------------------------------------
>
> Hey BFost -- first off, yes, I agree with you. Long have I preached to you
> and others that the Metadata key-values structure implies no ordering of
> the values. However, a real side effect for years has been that the Lucene
> catalog on persistence has maintained such an order (it still does -- the
> keys are unordered b/c it used to be a hash map, but the values in it have
> always been ordered). This is an artifact of the way that Lucene
> stores/persists fields.
>
> That being said the DataSourceCatalog has never preserved these semantics.
> It's always, as you've said, been whatever order the values were inserted
> into it. Since values for Metadata prior to the switch over to the Metadata
> group style (and away from the HashMap) were ordered, prior to that switch
> over, entering those values into the DataSourceCatalog made them unordered.
>
> Luca and I noticed this on a JPL project ("VFASTR" a radio astronomy
> project) where we did the classic mentality of starting out with Lucene;
> waiting until it doesn't scale anymore; then moving onto the
> DataSourceCatalog and a DB. When doing so, a bunch of downstream code for
> VFASTR broke b/c all along we had made the (incorrect) assumption that the
> values were ordered b/c that's the behavior we were seeing with the
> LuceneCatalog. Since I'm not coding too much on that project, and since
> Luca is, and since Luca didn't have the history, he and Andrew and others
> wrote all that code assuming that all would be well, and then when we
> switched to the DataSoureCatalog, all hell broke loose heh ;)
>
> So, Luca had done some testing and tried to come up with something in
> VFASTR-land that would work and that was the "fix"/"updates" to the
> DataSourceCatalog. I encouraged him to not just keep it in JPL project
> ville, but to bring it up to Apache and contribute it back. I think there
> wasn't enough time to discuss that contribution which is why I rolled the
> rev back and opened it up for discussion like we're having here.
>
> So my proposal was to:
>
> # introduce a property into DataSourceCatalog for ordering metadata
> fields. By default, it's turned off (to preserve the prior behavior of not
> maintaining that ordering). It can be turned on, to introduce a 3-4 line
> functionality patch to add a ORDER BY statement to the SQL returning met
> values, and to assume that the person deploying OODT has already installed
> a simple schema update to handle that pkey.
> # make sure this is also unit tested
>
> What do you think of that? Does that sound OK? Thanks for your feedback.
>
>
> > DataSourceCatalog implementation does not preserve order of metadata
> values
> >
> ---------------------------------------------------------------------------
> >
> >                 Key: OODT-551
> >                 URL: https://issues.apache.org/jira/browse/OODT-551
> >             Project: OODT
> >          Issue Type: Bug
> >          Components: file manager
> >    Affects Versions: 0.5
> >            Reporter: Luca Cinquini
> >            Assignee: Luca Cinquini
> >             Fix For: 0.6
> >
> >         Attachments: OODT-551.luca.patch.txt
> >
> >
> > The table that stores the metadata (key, value) pairs for the File
> Manager database-based implementation has no primary key - as a
> consequence, values are not guaranteed to be returned in any order, which
> is a problem for applications that rely on the order of the values (for
> example, among different metadata keys).
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message