mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not optimized for Sparse Vectors
Date Wed, 13 May 2009 22:36:45 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709184#action_12709184
] 

Jeff Eastman commented on MAHOUT-66:
------------------------------------

r 774566 implemented the SparseVector times optimizations and an associated unit test that
demonstrates a 5-10ms improvement when used with 50,000 cardinality, 1000 random element vectors
typical of text clustering applications. 

Upon inspection of Manhattan and EculideanDistanceMeasure implementations, I think that replacing
a single loop over all elements with - even optimized - vector operations which each do their
own iterations will not offer performance improvements even in the above situations.

I'm still open to a test which demonstrates I'm wrong. The proposed optimizations are certainly
cleaner looking.

> EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not optimized for Sparse
Vectors
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-66
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-66
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Priority: Minor
>         Attachments: MAHOUT-66.patch, MAHOUT-66.patch, MAHOUT-66.patch, MAHOUT-66.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message