commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [math] Questions about the linear package
Date Wed, 14 Oct 2009 18:07:12 GMT
Jake Mannix a écrit :
> Hi Luc,
> 
> 
> On Wed, Oct 14, 2009 at 3:01 AM, <luc.maisonobe@free.fr> wrote:
> 
>>>   * also for RealVector - No iterator methods?  So if the
>>> implementation is
>>> sparse, there's no way to just iterate over the non-zero entries?
>>> What's
>>> worse, you can't even subclass OpenMapVector and expose the iterator
>>> on the
>>> OpenIntToDoubleHashMap inner object, because it's private. :\
>> Good idea. You can use JIRA <https://issues.apache.org/jira/browse/MATH>
>> to register a request for implementing this. Patches are of course welcome.
>> There should probably be two iterators: one for all entries and one for the
>> non-default entries (which may be non-zeroes or non-NaN or anything else).
>>
> 
> I'll open up a ticket and attach a patch (with tests, naturally) later
> today.

Very good. Thank you

> 
> 
>> This API is set up the way I get it from an external contributor, so I
>> guess he had a use case for that. I extended it to remain in the same spirit
>> and get this huge mess. I'm sorry for that. I agree a more generic method
>> would be interesting. Removing these methods would however introduce an
>> incompatible API change, so this could be done only in a major release (i.e.
>> 3.0) which is probably a long time from now.
>>
> 
> Yeah, this is why I'm sad I missed the refactoring push to hit 2.0.  For
> now, however, a lot of implementation pain could get avoided with the
> iterator() and iterateNonDefault(), together with a single
> AbstractRealVector which has a default implementation of all of these crazy
> methods, for implementations which don't need to think about them.

I think reading at least the first half of the May thread about MTJ
would also help everyone interesting in this topic to see some of the
options already discussed. This thread is here:
<http://markmail.org/thread/a4lsywmh2i6mktkh>.

Luc

> 
> 
>> The generic method should also either be provided in two versions (all
>> entries and non-default entries) or it should have an iterator argument. For
>> example the cosine and exponential functions transform a zero entry into a
>> non-zero entry so they cannot ignore zero entries.
>>
>>>   * while we're at it, if there is map(), why not also double
>>> RealVector.collect(Collector()), where Collector defines void
>>> collect(int
>>> index, double value); and double result(); - this can be used for
>>> generic
>>> inner products and kernels (and can allow for consolidating all of
>>> the
>>> L1Norm(), norm(), and LInfNorm() methods into this same method,
>>> passing in
>>> different L1NormCollector() etc... instances).
>> Godd idea too. Another JIRA ticket for that ?
>>
> 
> JIRA ticket, tests, patch on the way.  Maybe today, we'll see. :)
> 
> 
>>>   * why all the methods which are overloaded to take either RealVector
>>> or
>>> double[] (getDistance, dotProduct, add, etc...) - is there really that
>>> much
>>> overhead in just implementing dotProduct(double[] d)  as just
>>> dotProduct(new
>>> ArrayRealVector(d, false)); - no copy is done, nothing is done but
>>> one
>>> object creation...
>> It's not the copy that could take time, but the iteration which needs to
>> call getEntry(). So yes, there is some overhead and it can be avoided by
>> providing the simple array version. Of course, a default implementation that
>> wraps the array into an ArrayRealVector can be added to the
>> AbstractRealVector class you proposed above, in order to simplify new
>> implementations.
>>
> 
> This depends on whether the implementation details:
> ArrayRealVector.dotProduct when passed another instance of ArrayRealVector,
> they have access to each others internals, and can avoid this getEntry()
> call altogether.  Other subclasses can have similar speedup strategies.  I
> can try and whip up a patch and some perf tests to check speed of these
> operations to verify - another JIRA ticket, I think? :)
> 
> 
>>>   * SparseVector is just a marker interface?  Does it serve any
>>> purpose?
>> For now, yes it is a marker interface. There was some discussion about
>> these interfaces just before the release of 2.0. the conclusion was that
>> they should remain semple markers at that time.
>>
> 
> Fair enough.
> 
> 
>> The idea was really that people could provide their own implementations.
>> Some methods that are close in spirit to the iterators you ask for are in
>> the matrix interfaces (the walkXxx methods) and are used in many algorithms
>> inside [math].
>>
> 
> Ok great, I'll try to play around with those.
> 
> 
>> If you intend to contribute them to [math], you'll have to put them on JIRA
>> and send a Software Grant <http://www.apache.org/licenses/#grants> to
>> Apache secretary. If you develop contributions directly for [math] (i.e. if
>> it is not preexisting software), then rather than a Software Grant we will
>> need either a Contributor License Agreement (CLA), either an Individual CLA
>> or a Corporate CLA <http://www.apache.org/licenses/#clas>.
>>
> 
> Yeah, I'm down with the "apache way", I'll attach patches to the JIRA
> tickets after clicking the lovely "you can have this" button.  None of the
> stuff I'm talking about contributing is a "large body of code" which needs a
> special grant (I'm sending Mahout a bunch of stuff which may need that,
> although I'm the only contributor to the project I'm donating, so I'm not
> sure the need even in that case).
> 
>     -jake
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message