mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations
Date Sun, 21 Feb 2010 21:45:27 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836452#action_12836452
] 

Ted Dunning commented on MAHOUT-300:
------------------------------------

Huh.... some of those times are a little surprising.

For DotProduct and CosineDistanceMeasure, SequentialAccessSparseVector is 3x faster than RandomAccessSparseVector
and 8x faster than DenseVector.  There the world is good.

But for SquaredEuclideanDistanceMeasure and TanimotoDistanceMeasure, there is little difference
while for ManhattanDistanceMeasure, SequentialAccessSparseVector is slower than RandomAccessSparseVector.

Is it just that for these last 3 distances the sequentiality has not been taken into account?

{noformat}
DotProduct
                             Rate = 3877.9443 MB/s         Rate = 9846.534 MB/s          Rate
= 31736.123 MB/s

org.apache.mahout.common.distance.CosineDistanceMeasure
                             Speed = 1690.1599 /sec        Speed = 3366.8774 /sec        Speed
= 12309.282 /sec

org.apache.mahout.common.distance.EuclideanDistanceMeasure
                             Speed = 2913.8206 /sec        Speed = 5868.9404 /sec        Speed
= 8209.688 /sec

org.apache.mahout.common.distance.ManhattanDistanceMeasure
                             Speed = 867.9127 /sec         Speed = 2435.4307 /sec        Speed
= 1048.7443 /sec

org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
                             Speed = 3387.1472 /sec        Speed = 7091.4087 /sec        Speed
= 8785.509 /sec

org.apache.mahout.common.distance.TanimotoDistanceMeasure
                             Speed = 1803.4031 /sec        Speed = 3873.8967 /sec        Speed
= 6844.7017 /sec
{noformat}


> Solve performance issues with Vector Implementations
> ----------------------------------------------------
>
>                 Key: MAHOUT-300
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-300
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-300.patch, MAHOUT-300.patch, MAHOUT-300.patch, MAHOUT-300.patch,
MAHOUT-300.patch
>
>
> AbstractVector operations like times
>   public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       int index = element.index();
>       result.setQuick(index, element.get() * x);
>     }
>     return result;
>   }
> should be implemented as follows
>  public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = result.iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       element.set(element.get() * x);
>     }
>     return result;
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message