Mailing-List: contact dev-help@oodt.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@oodt.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Mattmann, Chris A (388J)" <chris.a.mattmann@jpl.nasa.gov>
To: "Starch, Michael D (388L)" <Michael.D.Starch@jpl.nasa.gov>
CC: "dev@oodt.apache.org" <dev@oodt.apache.org>
Date: Wed, 12 Oct 2011 10:45:06 -0700
Subject: Re: Overriding equals/hashCode for the Product class
Thread-Topic: Overriding equals/hashCode for the Product class
Thread-Index: AcyJBq5quPY1xmBYRuukqh+HsU7oOA==
Message-ID: <4DB5C559-C56C-4179-BAF9-1881434FD629@jpl.nasa.gov>
References: <E4A504BD-6C80-412D-8234-6B3F5312B726@jpl.nasa.gov>
 <2C407204-4215-41B9-A8E5-6D5B2576020D@me.com>
 <1C43DAC9-A32A-4395-9371-5ECAC30551C2@jpl.nasa.gov>
 <4C48A68D-89B5-4CBE-96FD-B5A24FE87DE2@jpl.nasa.gov>
In-Reply-To: <4C48A68D-89B5-4CBE-96FD-B5A24FE87DE2@jpl.nasa.gov>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hey Michael,

On Oct 12, 2011, at 9:43 AM, Starch, Michael D (388L) wrote:

>=20
>=20
> Here is another related question: =20
>=20
> In our branch we have three catalog functions that have very similar data=
base back ends, Query, ComplexQuery, and PagedQuery.  Unfortunately, comple=
xQuery performs its work by first running "query" and then running individu=
al metadata requests for each id returned.  This is inefficient from a data=
base perspective as you are running many many queries, when a single query =
would suffice (and that single query was run once to get the list to begin =
with).

All of those functions perform that way, right? IOW, doesn't query and page=
dQuery also work that way?

But yes we could optimize it by reducing the amount of times we have to que=
ry I believe.

>=20
> According to our DBA we will see big gains if we eliminate this loop, and=
 the complexities of sorting the metadata have been solved (that was what y=
ielded my pervious question). =20

It would be really awesome to do some metrics on this because what's intere=
sting is that the WHERE clause fields on subsequent queries should be over =
productIDs which themselves are indexed and thus should not be too computat=
ionally expensive. They certainly involve computation but my wonder is how =
much optimization you'll gain at the cost of trying to engineer around this=
 and if it's negligible.=20

> Unfortunately, that means moving some complex query code into the catalog=
, and thus needing to return 2 completely different types from one method "=
query".

I think the longer term solution would be to make complexQuery itself a pag=
edMethod, and maybe even to get rid of complexQuery, and evolve pagedQuery =
to take a ComplexQuery object (right now it takes a Query, but ComplexQuery=
 extends Query, right?). Yes, this would involve making the other catalogs =
support this,=20
but it's probably more architecturally sound in the end.

On the other hand, it doesn't make your life a whole lot easier, so I could=
 understand if your answer was: "Don't have time at this point."

>  My first instinct is to add a complexQuery method to the catalog interfa=
ce (bad as it breaks older interfaces),

Yep, I wouldn't' be in support of that at the Catalog level.=20

> or sub-interface the catalog interface and add this method (better becaus=
e old catalogs would work as they do now), but seeing as you would like me =
to move this feature up to apache (assuming we can properly page it), perha=
ps you have a better solution that will keep our branch more compatible wit=
h apache, so I have less work to do to migrate my changes.

What do you think of my proposal above? To evolve pagedQuery to understand =
complexQuery (and thus to get the advantage of having complexQuery's be pag=
ed, which we're currently missing).

Cheers,
Chris

>=20
> On 11.10.2011, at 21:00, Chris A Mattmann wrote:
>=20
>> Hi All,
>>=20
>> On Oct 11, 2011, at 7:40 PM, Brian Foster wrote:
>>=20
>>> the problem with implementing an equals and hashCode function for the P=
roduct object is that it is not always created from db data... many of the =
objects in the structs package are 'fill what I know at the moment'... no g=
uarantee that any one member variable in the object will always be set...
>>=20
>> I totally agree with Brian on this. The lifecycle of any one of the FM o=
bjects in the o.a.oodt.cas.filemgr.structs (and furthermore in any o.a.oodt=
.*.structs package) is that any of the fields of the object may (or may not=
) be filled at any point in time. It really depends on the lifecycle of the=
 object, and the downstream use of them in a service, in the core, or in so=
me extension point. The objects are meant to be light-weight, and not repre=
sentative of the *full* set of information at any point in time unless abso=
lutely necessary (thereby lowering the total system footprint, etc.), makin=
g it more light-weight, etc.
>>=20
>>> for instance when a Product is created on the client side for an ingest=
 the productId is not set until after ingestion... also the current trunk f=
ilemgr's Product object doesn't have an ingested or received time attach to=
 it... at least the last time I checked it didn't... lol...
>>=20
>> +1, you are right, it's still that way, for the above stated reasons.
>>=20
>>> so an equals method which say just checked against productId and produc=
tName could give a false positive in some cases... for example making two s=
equential calls to getProductById() then calling equals (assuming we implem=
ented it) on the 2 Product objects returned would return true... but if the=
 Product was updated between the 2 calls, equals really should return false=
 because the first Product object is out of date...
>>=20
>> +1
>>=20
>>> and doing a deep equals on the Product object would make the operation =
expensive... the Product object is more meant to be an information carrier.=
.. I would recommend storing your Products in a Map<String,Product> where t=
he String key is ProductId
>>=20
>> +1, agreed. Using a Map<String, Product> structure is a good way to obvi=
ate this, and then to define some locally uniqueness key function inside of=
 that map (or accept the uniqueness of the product ID which *should be* uni=
que at least within a single FM catalog).
>>=20
>> Cheers,
>> Chris
>>=20
>>> On Oct 11, 2011, at 4:46 PM, "Starch, Michael D (388L)" <Michael.D.Star=
ch@jpl.nasa.gov> wrote:
>>>=20
>>>> Chris et all,
>>>>=20
>>>> Do you see any problems overriding the default equals, and hashCode me=
thods in the Product class (checking by memory address/reference) to someth=
ing that checks to see if the products logically represent the same thing (=
same id, name, etc)?
>>>>=20
>>>> My issue is the following, I receive data back from the database, with=
 multiple lines representing a single product (this is a database thing, an=
d the desired behavior).  Thus if I iterate across the results, I will get =
multiple Product objects that represent one real Product (and contain equiv=
alent member variables).  In essence they are the same "Product".  I can wr=
ite cleaner, faster, code to combine the results, if I can test them for eq=
uality and hash them directly, without first pulling out the productName, o=
r Id.
>>>>=20
>>>> This will be a problem if there is some code that expects two "Product=
s" that have identical member variables to fail the equality test if they a=
re distinct objects.
>>>>=20
>>>> Thanks,
>>>>=20
>>>> -Michael
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>=20
>>=20
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>=20
>=20


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++