oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Overriding equals/hashCode for the Product class
Date Wed, 12 Oct 2011 17:45:06 GMT
Hey Michael,

On Oct 12, 2011, at 9:43 AM, Starch, Michael D (388L) wrote:

> 
> 
> Here is another related question:  
> 
> In our branch we have three catalog functions that have very similar database back ends,
Query, ComplexQuery, and PagedQuery.  Unfortunately, complexQuery performs its work by first
running "query" and then running individual metadata requests for each id returned.  This
is inefficient from a database perspective as you are running many many queries, when a single
query would suffice (and that single query was run once to get the list to begin with).

All of those functions perform that way, right? IOW, doesn't query and pagedQuery also work
that way?

But yes we could optimize it by reducing the amount of times we have to query I believe.

> 
> According to our DBA we will see big gains if we eliminate this loop, and the complexities
of sorting the metadata have been solved (that was what yielded my pervious question).  

It would be really awesome to do some metrics on this because what's interesting is that the
WHERE clause fields on subsequent queries should be over productIDs which themselves are indexed
and thus should not be too computationally expensive. They certainly involve computation but
my wonder is how much optimization you'll gain at the cost of trying to engineer around this
and if it's negligible. 

> Unfortunately, that means moving some complex query code into the catalog, and thus needing
to return 2 completely different types from one method "query".

I think the longer term solution would be to make complexQuery itself a pagedMethod, and maybe
even to get rid of complexQuery, and evolve pagedQuery to take a ComplexQuery object (right
now it takes a Query, but ComplexQuery extends Query, right?). Yes, this would involve making
the other catalogs support this, 
but it's probably more architecturally sound in the end.

On the other hand, it doesn't make your life a whole lot easier, so I could understand if
your answer was: "Don't have time at this point."

>  My first instinct is to add a complexQuery method to the catalog interface (bad as it
breaks older interfaces),

Yep, I wouldn't' be in support of that at the Catalog level. 

> or sub-interface the catalog interface and add this method (better because old catalogs
would work as they do now), but seeing as you would like me to move this feature up to apache
(assuming we can properly page it), perhaps you have a better solution that will keep our
branch more compatible with apache, so I have less work to do to migrate my changes.

What do you think of my proposal above? To evolve pagedQuery to understand complexQuery (and
thus to get the advantage of having complexQuery's be paged, which we're currently missing).

Cheers,
Chris

> 
> On 11.10.2011, at 21:00, Chris A Mattmann wrote:
> 
>> Hi All,
>> 
>> On Oct 11, 2011, at 7:40 PM, Brian Foster wrote:
>> 
>>> the problem with implementing an equals and hashCode function for the Product
object is that it is not always created from db data... many of the objects in the structs
package are 'fill what I know at the moment'... no guarantee that any one member variable
in the object will always be set...
>> 
>> I totally agree with Brian on this. The lifecycle of any one of the FM objects in
the o.a.oodt.cas.filemgr.structs (and furthermore in any o.a.oodt.*.structs package) is that
any of the fields of the object may (or may not) be filled at any point in time. It really
depends on the lifecycle of the object, and the downstream use of them in a service, in the
core, or in some extension point. The objects are meant to be light-weight, and not representative
of the *full* set of information at any point in time unless absolutely necessary (thereby
lowering the total system footprint, etc.), making it more light-weight, etc.
>> 
>>> for instance when a Product is created on the client side for an ingest the productId
is not set until after ingestion... also the current trunk filemgr's Product object doesn't
have an ingested or received time attach to it... at least the last time I checked it didn't...
lol...
>> 
>> +1, you are right, it's still that way, for the above stated reasons.
>> 
>>> so an equals method which say just checked against productId and productName
could give a false positive in some cases... for example making two sequential calls to getProductById()
then calling equals (assuming we implemented it) on the 2 Product objects returned would return
true... but if the Product was updated between the 2 calls, equals really should return false
because the first Product object is out of date...
>> 
>> +1
>> 
>>> and doing a deep equals on the Product object would make the operation expensive...
the Product object is more meant to be an information carrier... I would recommend storing
your Products in a Map<String,Product> where the String key is ProductId
>> 
>> +1, agreed. Using a Map<String, Product> structure is a good way to obviate
this, and then to define some locally uniqueness key function inside of that map (or accept
the uniqueness of the product ID which *should be* unique at least within a single FM catalog).
>> 
>> Cheers,
>> Chris
>> 
>>> On Oct 11, 2011, at 4:46 PM, "Starch, Michael D (388L)" <Michael.D.Starch@jpl.nasa.gov>
wrote:
>>> 
>>>> Chris et all,
>>>> 
>>>> Do you see any problems overriding the default equals, and hashCode methods
in the Product class (checking by memory address/reference) to something that checks to see
if the products logically represent the same thing (same id, name, etc)?
>>>> 
>>>> My issue is the following, I receive data back from the database, with multiple
lines representing a single product (this is a database thing, and the desired behavior).
 Thus if I iterate across the results, I will get multiple Product objects that represent
one real Product (and contain equivalent member variables).  In essence they are the same
"Product".  I can write cleaner, faster, code to combine the results, if I can test them for
equality and hash them directly, without first pulling out the productName, or Id.
>>>> 
>>>> This will be a problem if there is some code that expects two "Products"
that have identical member variables to fail the equality test if they are distinct objects.
>>>> 
>>>> Thanks,
>>>> 
>>>> -Michael
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message