Return-Path: X-Original-To: apmail-oodt-dev-archive@www.apache.org Delivered-To: apmail-oodt-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F5EA7B67 for ; Wed, 12 Oct 2011 17:45:39 +0000 (UTC) Received: (qmail 5667 invoked by uid 500); 12 Oct 2011 17:45:39 -0000 Delivered-To: apmail-oodt-dev-archive@oodt.apache.org Received: (qmail 5644 invoked by uid 500); 12 Oct 2011 17:45:39 -0000 Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@oodt.apache.org Delivered-To: mailing list dev@oodt.apache.org Received: (qmail 5636 invoked by uid 99); 12 Oct 2011 17:45:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 17:45:39 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.149.139.105] (HELO mail.jpl.nasa.gov) (128.149.139.105) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 17:45:31 +0000 Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73]) by smtp.jpl.nasa.gov (Switch-3.4.3/Switch-3.4.3) with ESMTP id p9CHj7sj016122 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified NO) for ; Wed, 12 Oct 2011 10:45:08 -0700 Received: from ALTPHYEMBEVSP20.RES.AD.JPL ([128.149.137.83]) by ALTVIREHTSTAP02.RES.AD.JPL ([128.149.137.73]) with mapi; Wed, 12 Oct 2011 10:45:07 -0700 From: "Mattmann, Chris A (388J)" To: "Starch, Michael D (388L)" CC: "dev@oodt.apache.org" Date: Wed, 12 Oct 2011 10:45:06 -0700 Subject: Re: Overriding equals/hashCode for the Product class Thread-Topic: Overriding equals/hashCode for the Product class Thread-Index: AcyJBq5quPY1xmBYRuukqh+HsU7oOA== Message-ID: <4DB5C559-C56C-4179-BAF9-1881434FD629@jpl.nasa.gov> References: <2C407204-4215-41B9-A8E5-6D5B2576020D@me.com> <1C43DAC9-A32A-4395-9371-5ECAC30551C2@jpl.nasa.gov> <4C48A68D-89B5-4CBE-96FD-B5A24FE87DE2@jpl.nasa.gov> In-Reply-To: <4C48A68D-89B5-4CBE-96FD-B5A24FE87DE2@jpl.nasa.gov> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Source-IP: altvirehtstap02.jpl.nasa.gov [128.149.137.73] X-Source-Sender: chris.a.mattmann@jpl.nasa.gov X-AUTH: Authorized X-Virus-Checked: Checked by ClamAV on apache.org Hey Michael, On Oct 12, 2011, at 9:43 AM, Starch, Michael D (388L) wrote: >=20 >=20 > Here is another related question: =20 >=20 > In our branch we have three catalog functions that have very similar data= base back ends, Query, ComplexQuery, and PagedQuery. Unfortunately, comple= xQuery performs its work by first running "query" and then running individu= al metadata requests for each id returned. This is inefficient from a data= base perspective as you are running many many queries, when a single query = would suffice (and that single query was run once to get the list to begin = with). All of those functions perform that way, right? IOW, doesn't query and page= dQuery also work that way? But yes we could optimize it by reducing the amount of times we have to que= ry I believe. >=20 > According to our DBA we will see big gains if we eliminate this loop, and= the complexities of sorting the metadata have been solved (that was what y= ielded my pervious question). =20 It would be really awesome to do some metrics on this because what's intere= sting is that the WHERE clause fields on subsequent queries should be over = productIDs which themselves are indexed and thus should not be too computat= ionally expensive. They certainly involve computation but my wonder is how = much optimization you'll gain at the cost of trying to engineer around this= and if it's negligible.=20 > Unfortunately, that means moving some complex query code into the catalog= , and thus needing to return 2 completely different types from one method "= query". I think the longer term solution would be to make complexQuery itself a pag= edMethod, and maybe even to get rid of complexQuery, and evolve pagedQuery = to take a ComplexQuery object (right now it takes a Query, but ComplexQuery= extends Query, right?). Yes, this would involve making the other catalogs = support this,=20 but it's probably more architecturally sound in the end. On the other hand, it doesn't make your life a whole lot easier, so I could= understand if your answer was: "Don't have time at this point." > My first instinct is to add a complexQuery method to the catalog interfa= ce (bad as it breaks older interfaces), Yep, I wouldn't' be in support of that at the Catalog level.=20 > or sub-interface the catalog interface and add this method (better becaus= e old catalogs would work as they do now), but seeing as you would like me = to move this feature up to apache (assuming we can properly page it), perha= ps you have a better solution that will keep our branch more compatible wit= h apache, so I have less work to do to migrate my changes. What do you think of my proposal above? To evolve pagedQuery to understand = complexQuery (and thus to get the advantage of having complexQuery's be pag= ed, which we're currently missing). Cheers, Chris >=20 > On 11.10.2011, at 21:00, Chris A Mattmann wrote: >=20 >> Hi All, >>=20 >> On Oct 11, 2011, at 7:40 PM, Brian Foster wrote: >>=20 >>> the problem with implementing an equals and hashCode function for the P= roduct object is that it is not always created from db data... many of the = objects in the structs package are 'fill what I know at the moment'... no g= uarantee that any one member variable in the object will always be set... >>=20 >> I totally agree with Brian on this. The lifecycle of any one of the FM o= bjects in the o.a.oodt.cas.filemgr.structs (and furthermore in any o.a.oodt= .*.structs package) is that any of the fields of the object may (or may not= ) be filled at any point in time. It really depends on the lifecycle of the= object, and the downstream use of them in a service, in the core, or in so= me extension point. The objects are meant to be light-weight, and not repre= sentative of the *full* set of information at any point in time unless abso= lutely necessary (thereby lowering the total system footprint, etc.), makin= g it more light-weight, etc. >>=20 >>> for instance when a Product is created on the client side for an ingest= the productId is not set until after ingestion... also the current trunk f= ilemgr's Product object doesn't have an ingested or received time attach to= it... at least the last time I checked it didn't... lol... >>=20 >> +1, you are right, it's still that way, for the above stated reasons. >>=20 >>> so an equals method which say just checked against productId and produc= tName could give a false positive in some cases... for example making two s= equential calls to getProductById() then calling equals (assuming we implem= ented it) on the 2 Product objects returned would return true... but if the= Product was updated between the 2 calls, equals really should return false= because the first Product object is out of date... >>=20 >> +1 >>=20 >>> and doing a deep equals on the Product object would make the operation = expensive... the Product object is more meant to be an information carrier.= .. I would recommend storing your Products in a Map where t= he String key is ProductId >>=20 >> +1, agreed. Using a Map structure is a good way to obvi= ate this, and then to define some locally uniqueness key function inside of= that map (or accept the uniqueness of the product ID which *should be* uni= que at least within a single FM catalog). >>=20 >> Cheers, >> Chris >>=20 >>> On Oct 11, 2011, at 4:46 PM, "Starch, Michael D (388L)" wrote: >>>=20 >>>> Chris et all, >>>>=20 >>>> Do you see any problems overriding the default equals, and hashCode me= thods in the Product class (checking by memory address/reference) to someth= ing that checks to see if the products logically represent the same thing (= same id, name, etc)? >>>>=20 >>>> My issue is the following, I receive data back from the database, with= multiple lines representing a single product (this is a database thing, an= d the desired behavior). Thus if I iterate across the results, I will get = multiple Product objects that represent one real Product (and contain equiv= alent member variables). In essence they are the same "Product". I can wr= ite cleaner, faster, code to combine the results, if I can test them for eq= uality and hash them directly, without first pulling out the productName, o= r Id. >>>>=20 >>>> This will be a problem if there is some code that expects two "Product= s" that have identical member variables to fail the equality test if they a= re distinct objects. >>>>=20 >>>> Thanks, >>>>=20 >>>> -Michael >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>=20 >>=20 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattmann@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>=20 >=20 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattmann@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++