Return-Path: X-Original-To: apmail-oodt-dev-archive@www.apache.org Delivered-To: apmail-oodt-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EB5BEFAC for ; Sun, 20 Jan 2013 17:56:51 +0000 (UTC) Received: (qmail 91804 invoked by uid 500); 20 Jan 2013 17:56:51 -0000 Delivered-To: apmail-oodt-dev-archive@oodt.apache.org Received: (qmail 91717 invoked by uid 500); 20 Jan 2013 17:56:50 -0000 Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@oodt.apache.org Delivered-To: mailing list dev@oodt.apache.org Received: (qmail 91709 invoked by uid 99); 20 Jan 2013 17:56:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jan 2013 17:56:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of brbarkstrom@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-we0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jan 2013 17:56:44 +0000 Received: by mail-we0-f169.google.com with SMTP id t11so1242980wey.14 for ; Sun, 20 Jan 2013 09:56:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=YsScZznKT3AumhTxYvaKOgPSdO7gbnZDhdy5ooM0Uzk=; b=ZmIQv/a3L3+bIWYwk+KR5KW7eGqF3LmtqCEwkxRPYWOPVSn9sH51oO+y/DoVIeGU1c VH8SRTWid5Gn45n42vRul0+bfmAEn/bUpGShmYqMuMPUp0dNBsCqqS1mnQEIi50jQHxs f97J0pG65eSer6D6SXRDonXH+YWCU+NsQfeerImGgrz7xSlViJJTThCFojXGOgRkzg7Y 3On2LHFlq0GWv10XiivYuqDsE3TSWalyasAKgSS2BE2VAogOPmVuNvw704Ntp8c5cRmN PI/Yh5lm4Bn8UEdhTs2zwue+njnu4GZNGNL18nxy/Qt85bdyJ4VxnqYCHm9w3LsRiL16 HHWw== MIME-Version: 1.0 X-Received: by 10.194.238.5 with SMTP id vg5mr22501795wjc.40.1358704583748; Sun, 20 Jan 2013 09:56:23 -0800 (PST) Received: by 10.194.137.38 with HTTP; Sun, 20 Jan 2013 09:56:23 -0800 (PST) In-Reply-To: References: Date: Sun, 20 Jan 2013 12:56:23 -0500 Message-ID: Subject: Re: [jira] [Commented] (OODT-551) DataSourceCatalog implementation does not preserve order of metadata values From: Bruce Barkstrom To: dev@oodt.apache.org Content-Type: multipart/alternative; boundary=089e0122f17011ac9604d3bc1044 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122f17011ac9604d3bc1044 Content-Type: text/plain; charset=ISO-8859-1 There is an interesting and fundamental dichotomy between identifier schemas intended solely for use with relational databases, where the primary keys have no intrinsic order (like UUIDs and related cryptographic digests) and identifier schemas that may be used more by humans. In terms of the Open Archive Information System Reference Model, this could be stated in terms of whether the Designated Community for the Archive includes primarily human user communities or may include machines performing tasks authorized by humans. Ones involving primarily humans may well include sequencing information and the identifiers may contain information semantically usable by humans, although perhaps not by machines running typical software. There are a number of problems that arise when the archive contains objects that are composed (or aggregated) from other objects - and in which human beings may be involved in producing or interpreting the identifiers. It's an interesting problem. For OODT, it might be a good idea to be prepared for both kinds of possibilities - as the JPL example shows. Bruce B. On Sun, Jan 20, 2013 at 12:34 PM, Chris A. Mattmann (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/OODT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558323#comment-13558323] > > Chris A. Mattmann commented on OODT-551: > ---------------------------------------- > > Hey BFost -- first off, yes, I agree with you. Long have I preached to you > and others that the Metadata key-values structure implies no ordering of > the values. However, a real side effect for years has been that the Lucene > catalog on persistence has maintained such an order (it still does -- the > keys are unordered b/c it used to be a hash map, but the values in it have > always been ordered). This is an artifact of the way that Lucene > stores/persists fields. > > That being said the DataSourceCatalog has never preserved these semantics. > It's always, as you've said, been whatever order the values were inserted > into it. Since values for Metadata prior to the switch over to the Metadata > group style (and away from the HashMap) were ordered, prior to that switch > over, entering those values into the DataSourceCatalog made them unordered. > > Luca and I noticed this on a JPL project ("VFASTR" a radio astronomy > project) where we did the classic mentality of starting out with Lucene; > waiting until it doesn't scale anymore; then moving onto the > DataSourceCatalog and a DB. When doing so, a bunch of downstream code for > VFASTR broke b/c all along we had made the (incorrect) assumption that the > values were ordered b/c that's the behavior we were seeing with the > LuceneCatalog. Since I'm not coding too much on that project, and since > Luca is, and since Luca didn't have the history, he and Andrew and others > wrote all that code assuming that all would be well, and then when we > switched to the DataSoureCatalog, all hell broke loose heh ;) > > So, Luca had done some testing and tried to come up with something in > VFASTR-land that would work and that was the "fix"/"updates" to the > DataSourceCatalog. I encouraged him to not just keep it in JPL project > ville, but to bring it up to Apache and contribute it back. I think there > wasn't enough time to discuss that contribution which is why I rolled the > rev back and opened it up for discussion like we're having here. > > So my proposal was to: > > # introduce a property into DataSourceCatalog for ordering metadata > fields. By default, it's turned off (to preserve the prior behavior of not > maintaining that ordering). It can be turned on, to introduce a 3-4 line > functionality patch to add a ORDER BY statement to the SQL returning met > values, and to assume that the person deploying OODT has already installed > a simple schema update to handle that pkey. > # make sure this is also unit tested > > What do you think of that? Does that sound OK? Thanks for your feedback. > > > > DataSourceCatalog implementation does not preserve order of metadata > values > > > --------------------------------------------------------------------------- > > > > Key: OODT-551 > > URL: https://issues.apache.org/jira/browse/OODT-551 > > Project: OODT > > Issue Type: Bug > > Components: file manager > > Affects Versions: 0.5 > > Reporter: Luca Cinquini > > Assignee: Luca Cinquini > > Fix For: 0.6 > > > > Attachments: OODT-551.luca.patch.txt > > > > > > The table that stores the metadata (key, value) pairs for the File > Manager database-based implementation has no primary key - as a > consequence, values are not guaranteed to be returned in any order, which > is a problem for applications that rely on the order of the values (for > example, among different metadata keys). > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira > --089e0122f17011ac9604d3bc1044--