commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <>
Subject Re: [math] Some issues with DoubleArrays
Date Mon, 23 Jun 2003 16:09:35 GMT
Mark R. Diggory wrote:
> Phil Steitz wrote:
>> Tim O'Brien wrote:
>>> What about this possibility.  we could easily have DoubleArray return 
>>> a reference to the internalStorageArray.  I know this would violate 
>>> encapsulation, but if we expose the interal array, the start and end 
>>> index then there is no need to copy the contents of the array.  
>>> Instead we pass a reference to an existing array - aka, no need to 
>>> copy our element array.
>> +1 -- it *is* after all an array and if this is not exposed, you are 
>> always going to be stuck with using ArrayCopy to get at the underlying 
>> data, which makes efficient computation using large arrays impossible. 
>> I agonized over this same decision vis a vis RealMatrixImpl, where I 
>> ended up "breaking encapsulation" (similarly to other double[][]-based 
>> implementations) and exposing a getDataRef method that returns a 
>> reference to the underlying double[][] array.
> I like it too, since I've been in looking at/messing with these classes 
> I be glad to make the changes for us and add the static methods to the 
> StatUtils. One note, I think we should retain a method that does copy 
> the array as well as create one that exposes it, this is because the 
> copy veriosn can provide us with an array copy that is trimmed down to 
> the size of the actual content, because the internal store inceases 
> "incrimentally" in the windowless case, there is the case that there are 
> unitialized/unused sections at the end of the array (as well, in the 
> windowed case, if the array isn't filled yet, there are unused 
> sections). Providing an interface to retrieve a "cleaned" array is a 
> useful option if one wants to retieve the data to manipulate it 
> elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.

Yes.  I would certainly not recommend dropping the existing 
getElements() or replacing it with reference semantics.  What I did in 
RealMatrix was to provide both getData and getDataRef, with the latter 
returning a reference.  I would reserve getElements() for copy semantics 
and call the reference version something else.

>>> Now, every method that takes a double[] in StatUtil, would be altered 
>>> to take a (double[], int start, int length).   So,
>>> public static double sum(double[] values);
>>> would delegate to a more "generic"
>>> public static double sum(double[] values, int startIndex, int length);
>> I agree -- I think that Brent suggested this improvement already.
> On the topic of StatUtils, what are the opinions about adding the 
> following methods from my discussion with the lang group to provide 
> alternate primitive implementations? These would be for short, long, 
> int, float for now.

I don't see any harm in adding these; but I would not put a high 
priority on implementing them and I agree with Stephen that there is no 
harm in lang including the min/max functions directly in lang.math as 
well.  Some duplication across packages is OK, IMHO.  Also, I would not 
want lang -- or any other component -- to depend on anything in math 
until we have successfully emerged from the sandbox with a release. 
What may actually make more sense is for lang.math to add the min, max 
stuff and us to use their implementations of these in place of our own. 
  But, once again, these are trivial functions and I see nothing wrong 
with implementing them in both places.  Note that in any case, we will 
want to implement these with array offset arguments, which lang may not 
be interested in.

One more note on the min-max stuff: the implementation in StatUtils 
calls Math.min/max each time through the comparison loop. The loop 
should probably be rewritten to just keep track of the min/max and do a 
straight compare each time through (similar to what UnivariateImpl does) 
to avoid the unecessary function call within the loop.

> primitive <-- min(primitive[])
> primitive <-- max(primitive[])
> primitive <-- sum(primitive[])
> primitive <-- sumSq(primitive[])
> in terms of other stat methods the theme would be more like:
> double <-- mean(primitive[])
> double <-- var(primitive[])
> double <-- std(primitive[])
> possibly similar methods for other stat methods, these all would involve 
> casting the elements to double prior to calculating?

Yes, you would have to cast before computation, which sort of blows away 
the value of the array-based implementation.  May be better to add 
addValue(primitive[]) to Univariate.  I have been meaning to suggest 
addValue(double[]) for a while now.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message