mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Maillot <nmail...@gmail.com>
Subject Re: Mahalanobis Distance Implementation
Date Mon, 19 Jul 2010 20:08:30 GMT
Hi Ted,

Thanks for the piece of advice.
I have put a first implementation of the Mahalanobis distance in
http://www.nicolas-maillot.net/MahoutContribs/
Those files should be put in three different places to be tested:

-in core/src/test/java/org/apache/mahout/common/distance :
http://www.nicolas-maillot.net/MahoutContribs/TestMahalanobisDistanceMeasure.java

- in  core/src/main/java/org/apache/mahout/common/distance :
http://www.nicolas-maillot.net/MahoutContribs/MahalanobisDistanceMeasure.java

-in math/src/main/java/org/apache/mahout/math :
http://www.nicolas-maillot.net/MahoutContribs/SingularValueDecomposition.java
http://www.nicolas-maillot.net/MahoutContribs/Algebra.java

All compiles and the unit tests are locally fine with the latest
version of Mahout.
I only added new files, so it limit risks.

As you can see, I have migrated SingularValueDecomposition (useful to
tackle the inverse covariance problem)  to use the Matrix Class.
I just Started with Algebra too (just what I needed for the
Mahalanobis stuff), I will continue asap.

One  question, is putting those new implementations in
src/main/java/org/apache/mahout/math a good way to go ?
Migrating the stuff in new files is easier, once we are done, we will
be able to remove the old implementation.
We just need to find the best place for that.

Any comment or feedback is welcome. I will then submit an official
path with a few more comments.

Many thanks,

Cheers,

Nicolas






On Sat, Jul 17, 2010 at 11:40 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Sat, Jul 17, 2010 at 2:07 PM, Nicolas Maillot <nmaillot@gmail.com> wrote:
>
>> Hi all,
>>
>> I would like to contribute to Mahout by starting with a simple  task.
>>
>> I had the idea of Implementing the Mahalanobis distance :
>> http://en.wikipedia.org/wiki/Mahalanobis_distance
>> This should be quite easy and could be a useful feature.
>>
>> After looking  at the Mahout code for a couple of hours, I have three
>> questions:
>>
>> 1) Generally speaking, what is the level of reliability of algorithms
>> implemented in matrix.linalg ?
>>
>
> They should be reasonably good, but they mostly lack unit tests.  Further,
> they have generally not been converted to use Vector and Matrix which is
> something that is probably necessary over time.  I have done a little bit in
> this line, but we need quite a bit more.
>
> As you find bits that you want to use, you should convert them over.  THis
> will be a bit like pulling a thread on a sweater in that it will cause lots
> of additional bits to need conversion.  I will help with this.  As part of
> the conversion, we should add unit tests as well.
>
>
>> 2) As inverting a covariance matrix is required, is using the method
>> inverse(DoubleMatrix2D A) from class Algebra a good way to achieve this ?
>>
>
> Generally it is considered very bad practice to invert a matrix.  Instead,
> you should use some sort of easily solved matrix decomposition.  Commonly,
> QR decomposition is used, but SVD might be useful in certain situations.
>
> As I remember Mahalonobis, it uses moments to estimate principal components
> in the form of the inverse covariance matrix.  Mahalonobis distances are
> then computed in this reference frame.  This should be amenable to QR
> decomposition.
>
>
>> 3) Does is look reasonable if the computation of the distance is based on
>> dense matrices ( inside the method distance(Vector v1,Vector v2) method
>> part
>> of the future MahalanobisDistanceMeasure class) ?
>>
>
> For use in recommendations, it should be assumed that the inputs are sparse,
> but internal products might well be dense if they are fixed in size and
> relatively small.  The computation of distance should accept sparse vectors,
> but projecting them down to small dense vectors to do the distance
> computation is just fine.
>
>
>> 4) What is the best way to create a DoubleMatrix1D from a Vector ?
>>
>
> Don't.
>
> Just convert the code using the DoubleMatrix1D as mentioned above.
>

Mime
View raw message