Mailing-List: contact dev-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Commons Developers List" <dev@commons.apache.org>
Subject: Re: [math] Should this throw a NO_DATA exception?
To: Commons Developers List <dev@commons.apache.org>
References: <56905737.7040505@gmail.com>
 <31e7b16fb6b9190ecc7d50702218466f@scarlet.be> <5690A859.90002@gmail.com>
 <76cef62496968f83eb664cb48503e7a9@scarlet.be>
From: Ole Ersoy <ole.ersoy@gmail.com>
Message-ID: <56915745.9070006@gmail.com>
Date: Sat, 9 Jan 2016 12:53:57 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <76cef62496968f83eb664cb48503e7a9@scarlet.be>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

HI,

On 01/09/2016 06:21 AM, Gilles wrote:
[...]
> But we should know the target of the improvement.
> I mean, is it a drop-in replacement of the current "RealVector"?
OK - I think it's probably confusing because I posted JDK8 examples earlier.  I'm just wondering whether the current RealVector norm methods should throw a no data exception?  I think they should.

>
> If so, how can it happen before we agree that Java 8 JDK can be
> used in the next major release of CM?
At some point I'm sure CM will switch over, so we can start experimenting with features now.

>
> If it's a redesign, maybe we should define a "wish list" of what
> properties should belong to which concept.
I think that is a good inclusive approach for a community.  My primary wishes are:
- Remove inheritance when possible in order to keep it simple (Possibly at the expense of generic use)
- Design classes that are focused on doing small simple things
- Modularize (I could list all the benefits, but I think we know them).  The longer CM takes to do this the harder it will be.  Every single time someone sprinkles it FastMath it gets a little harder...

So in general just keep it simple.  If it needs to support other requirements then:

Reuse operations from a FunctionXXX class.  Support new forms of state in a new module.

> E.g. for a "matrix" it might be useful to be mutable (as per
> previous discussions on the subject),

I think the approach here should be very strict.  For example ArrayRealVector has almost half the code dedicated to mutation that can easily be done elsewhere.  I think this was done because CM is not modular.  We can't defer to an array module for the manipulations so they had to be baked into ArrayRealVector.

> but for a (geometrical)
> vector it might be interesting to not be (as in the case for
> "Vector3D" in the "geometry" package).
>
> The "matrix" concept probably requires a more advanced interface
> in order to allow efficient implementations of basic operations
> like multiplication...
Yes - For example when multiplying a sparce matrix times a sparce vector?  Or a normal vector times a sparce matrix?  Etc.  I'm hoping there's a very simple way to accomplish this outside of using inheritance.

>
>>>
>>> There is a issue on the bug-tracking system that started to
>>> collect many of the various problems (specific and general)
>>> of data containers ("RealVector", "RealMatrix", etc.) of the
>>> "o.a.c.m.linear" package.
>>>
>>>
>>> Perhaps it should more useful, for future reference, to list
>>> everything in one place.
>> Sure - I think in this case though we can knock it out fast.
>> Sometimes when we list everything in one place people look at it, get
>> a headache, and start drinking :).  To me it seems like a vector that
>> is empty (no elements) is different from having a vector with 1 or
>> more 0d entries.  In the latter case, according to the formula, the
>> norm is zero, but in the first case, is it?
>
> To be on the safe side, it should be an error, but I've just had
> to let this kind of condition pass (cf. MATH-1300 and related on
> the implemenation of "nextBytes(byte[],int,int)" feature).
>
>>
>>>
>>> On Fri, 8 Jan 2016 18:41:27 -0600, Ole Ersoy wrote:
>>>> public double getLInfNorm() {
>>>>         double norm = 0;
>>>>         Iterator<Entry> it = iterator();
>>>>         while (it.hasNext()) {
>>>>             final Entry e = it.next();
>>>>             norm = FastMath.max(norm, FastMath.abs(e.getValue()));
>>>>         }
>>>>         return norm;
>>>>     }
>>>
>>> The main problem with the above is that it assumes that the elements
>>> of a "RealVector" are Cartesian coordinates.
>>> There is no provision that it must be the case, and assuming it is
>>> then in contradiction with other methods like "append".
>>
>> While experimenting with the design of the current implementation I
>> ended up throwing the exception.  I think it's the right thing to do.
>> The net effect is that if someone creates a new ArrayVector(new
>> double[]{}), then the exception is thrown, so if they don't want it
>> thrown then they should new ArrayVector(new double[]{0}).   More
>> explanations of this design below ...
>
> I don't know at this point (not knowing the intended usage).
One way to look at it is to say "Conceptually it is not correct, but we are using it in a way that eliminates this flaw, so it's OK". Which I don't think is OK, unless we can say conclusively and globally that it's OK for all users in all cases.  In this case I think returning a zero norm when there is no data is wrong, and can potentially lead to wrong results.

>
> [I think this is low-level discussion that is not impacting on the
> design but would fixe an API at a too early stage.]
Yes I see your point there.  Why patch the roof if the house is getting demolished in two weeks.  CM seems to be really nice to all the interested parties with respect to this though.  Ubuntu provides long term supported releases.  Fedora releases every six months and discontinuous updates for the previous releases, but CentOS picks up the slack there.

>
>>>
>>> At first (and second and third) sight, I think that these container
>>> classes should be abandoned and replaced by specific ones.
>>> For example:
>>> * Single "matrix" abstract type or interface for computations in
>>>   the "linear" package (rather than "vector" and "matrix" types)
>>> * Perhaps a "DoubleArray" (for such things as "append", etc.).
>>>   And by the way, there already exists "ResizableDoubleArray" which
>>>   could be a start.
>>> * Geometrical vectors (that can perhaps support various coordinate
>>>   systems)
>>> * ...
>>
>> I think we are thinking along the same lines here.  So far I have the
>> following:
>> A Vector interface with only these methods:
>> - getDimension()
>> - getEntry()
>> - setEntry()
>
> And what is the concept that is being represented by this interface?
It's just a simple vector (The type we see in Wolfram documentation) [x1, x2, x3,.....xN].  Ideally it would just be an array, but it needs to throw a MathException.OUT_OF_RANGE when attempts are made to get or alter non existing entries.

>
> I think that is necessary to list use-cases so that we don't again
> come up with a design that may prove not specific enough to satisfy
> some requirements of the purported audience.
We can do that.  With the above I'm just starting with the smallest building block possible.  We can add things if needed.  I'm hoping it's not needed, and that we can support additional requirements in another small class.

>
>> An ArrayVector implements Vector implementation where the one and
>> only constructor takes a double[] array argument.  The vector length
>> cannot be mutated.  If someone wants to do that they have to create a
>> new one.
>
> Assuming we explore the 3 concepts I had listed above
> * it cannot be "matrix" (since I supposed that a row or column matrix
>   could be of type "matrix" not "vector")
So I think in general I prefer if one thing cannot be another.  It's simpler when the thing is just the thing.  With the latter it's easy to start getting convoluted.  Sometimes it's worth it ...but I think it's easier on designers, maintainers, and users most of the time when things are distinct.

So say we discover later that we think the Vector really should be a one dimensional Matrix.  That might be worth it, but ATM I don't see how to do it without making the function and vector interface more complex.


> * it cannot be an appendable sequence, since the size is fixed.
That's fine (I think...maybe there's a case that shows that this adds a lot of overhead?...) because it is an Array based vector. The size is inherently fixed.  If we want to change the size grab the underlying array and change it.  If the data structure is different, then still do the same thing.

>
> * it cannot be a geometrical vector since "getEntry(int)" and
>   "setEntry(int, double)" are too low level to ensure consistency
>   under transformations (since we cannot assume that the entries
>   would always be Cartesian coordinates).
Which I think is good.  I looked at OJAlgo and I found the use of generics a bit extreme.  Very little code is documented and the lack of simple examples suggests to me that it could be a lot simpler.

>
>>
>> A VectorFunctionFactory class containing methods that return Function
>> and BiFunction instances that can be used to perform vector mapping
>> and reduction.  For example:
>>
>>     /**
>>      * Returns a {@link Function} that produces the lInfNorm of the vector
>>      * {@code v} .
>>      *
>>      * Example {@code lInfNorm().apply(v);}
>>      * @throws MathException
>>      *             Of type {@code NO_DATA} if {@code v1.getDimension()} == 0.
>>      */
>>     public static Function<Vector, Double> lInfNorm() {
>>         return lInfNorm(false);
>>     };
>>
>>     /**
>>      * Returns a {@link Function} that produces the lInfNormNorm of
>> the vector
>>      * {@code v} .
>>      *
>>      * Example {@code lInfNorm(true).apply(v);}
>>      *
>>      * @param parallel
>>      *            Whether to perform the operation in parallel.
>>      * @throws MathException
>>      *             Of type {@code NO_DATA} if {@code v.getDimension()}  == 0.
>>      *
>>      */
>>     public static Function<Vector, Double> lInfNorm(boolean parallel) {
>>         return (v) -> {
>> LinearExceptionFactory.checkNoVectorData(v.getDimension());
>>             IntStream stream = range(0, v.getDimension());
>>             stream = parallel ? stream.parallel() : stream;
>>             return stream.mapToDouble(i ->
>> Math.abs(v.getEntry(i))).max().getAsDouble();
>>         };
>>     }
>
> This is a nice possibility, but without a purpose, it could seem that
> you just move the "operations" from the container class to another one.
The primary purpose is that we can use any of those operations without needing an instance of a class or inheriting it.

> It's cleaner, certainly, but could it be that the factory will end up
> with as many conceptually incompatible operations as the current design?
Maybe?  My first goal is to be able to provide simple examples.  If there's something that can't be done, then I'll first design a simple API example that gets it done, and then consider how the implementation for that should be done.  Maybe that leads to some additional work, but I prefer that over cluttering classes or making them overly generic (Unless there's a really good strong reason).

>
>> So the design leaves more specialized structures like Sparce matrices
>> to a different module.  I'm not sure if this is the best design, but
>> so far I'm feeling pretty good about it.  WDYT?
>
> So you were really working on the "matrix" design?
I'm looking at the whole linear package in general.
>
> Did you look at what the requirements are for these structures
> (e.g. for fast multiplication) and how they achieve it in other
> packages (e.g. "ojalgo")?
Yes I did have a look and I will look more.  The lack of simple examples was a bit of a breaker for me.

>
> If it's not about "matrix" but about blocks of (possibly multi-dimensional)
> data that can be "mapped" and "reduced", perhaps that the one-dimensional
> version (which seems what your new "Vector" is) should just be a special
> case of an interface for this kind of structure (?).
Maybe.  My hunch is that as is it is very easy to work with and understand and that if something more complex is needed then it should be built in an other module.  I'm willing to change the whole approach if someone can demonstrate an example / concept that shows that a brunt of use cases cannot be satisfied without a lot of rework.

> [It this latter case, the CM "MultidimensionalCounter" (in package "util")
> might be something that can be reused (?).]
Does it fit with streams?  I'm seeing how far I can go with these ATM.  I need to get more educated on the Multidimensionalcounter.

Cheers,
Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org