commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <mdigg...@latte.harvard.edu>
Subject Re: [math] Apples or Oranges
Date Mon, 16 Jun 2003 19:40:00 GMT
Tim O'Brien wrote:

>On Mon, 16 Jun 2003, Mark R. Diggory wrote:
>  
>
>>(1) Bean etiquette suggests "getters" are for bean properties, its 
>>usually recommended that  this means that they do nothing more than 
>>return the value for a property. This is beneficial in our Univariate 
>>case when calling a getter many times without adding a new value (lets 
>>say you use "getKurtosis" allot in a calculation before adding another 
>>value), then its more logical to have the kurtosis only calculated once 
>>and put the code for calculating it in the addValue method.
>>    
>>
>
>These objects are not JavaBeans, but using getXXX naming standards does
>provide some benefits (say create a Univariate instance and reference it
>from EL, Velocity, etc...).  I don't see any problems violating the 
>standard for bean properties as these are not really "properties".
>  
>
Yes, just as long as we all agree that these are not really Java Beans, 
then I'm ok with it too.

>>(2) However, If calling addValue many times (more likely the case) with 
>>only the interest of getting the "getMean" back, its wasted 
>>computational time to calculate all the other Stats (like kurtosis) in 
>>addValue when you just want the results of "getMean" back after each 
>>"addValue".
>>    
>>
>
>It is important to remember that in some of the stored univariate 
>instances the storage medium is external to the Univariate instance.  In 
>those cases, I don't see us being able to consolidate any of our 
>calculations in addValue().  In other words, ListUnivariateImpl is imply 
>attached to an external List - a user can go ahead and add 100 values to 
>that list without ListUnivariateImpl's involvement.
>  
>
I'm talking strictly about UnivariateImpl at this time, I'm not quite 
ready to delve into the Storage Implementations. I understand and value 
the benifit of what your pointing out. Storage based Univariate 
Implementations have different requirements than "UnivariateImpl" from 
this standpoint. But, I do think some aspects of what Andreou is point 
out could optimize those implementations in the future too. I could be 
possible to establish a sort of "concurrentModification" style attack in 
addValue such that if the underlying List or Array was modified, it 
could be detected by the the Univariate Implementation and such a 
"caching" mechanism could be updated (I'm not sure though, this may not 
be something to explore before reaching release).

Andreou Andreas wrote:

> Mark, I would go for the latter approach (the one on the p.s.) cause 
> it doesn't seem that complex to me...
> Why not add a CachableUnivariateImpl class
> that extends UnivariateImpl
> and also keeps track in a cache the results of the getters (getMean, 
> getKurtosis, e.t.c.).
> In this way, whenever a new value is added, the cache will be cleared, 
> and on calling the getters, each correspending statistic will be
> recalculated.
> If no new values have been added, this new subclass will just return 
> the cached results... 


Yes, I think this is a novel idea to explore in the future, its 
difficult to draw the lines on what to store in it because at this time, 
we are now calculating the mean/variance in addValue with Al's new 
2-pass algorithm, while the more complex kurt and skew calculations are 
in the getter methods. But, I like the idea of it. I'm working on 2-pass 
style algorithms for skew and kurt now. Which may unfortunately require 
more calculation to occur in addValue than I want to see happening.

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message