mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: [jira] [Commented] (MAHOUT-1190) SequentialAccessSparseVector function assignment is very slow
Date Sun, 14 Apr 2013 16:34:18 GMT
Its ops per second. Machine is 15" late 2012 mbp with retina.
On Apr 14, 2013 11:32 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:

> Big is good in this spreadsheet?
>
>
> On Sun, Apr 14, 2013 at 8:56 AM, Robin Anil (JIRA) <jira@apache.org>
> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631336#comment-13631336
> ]
> >
> > Robin Anil commented on MAHOUT-1190:
> > ------------------------------------
> >
> > Here is a summary of the improvements across two sparsities (100:1,
> 10:1).
> > At 10K cardinality DenseVectors are about 3x faster on CosineDistance
> than
> > the sparse format.
> >
> >
> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=1
> >
> > > SequentialAccessSparseVector function assignment is very slow
> > > -------------------------------------------------------------
> > >
> > >                 Key: MAHOUT-1190
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1190
> > >             Project: Mahout
> > >          Issue Type: Bug
> > >            Reporter: Dan Filimon
> > >         Attachments: MAHOUT-1190-1.patch, MAHOUT-1190.patch
> > >
> > >
> > > Currently when calling .assign() on a SASV with another vector and a
> > custom function, it will iterate through it and assign every single entry
> > while also referring it by index.
> > > This makes the process *hugely* expensive. (on a run of BallKMeans on
> > the 20 newsgroups data set, profiling reveals that 92% of the runtime was
> > spent updating assigning the vectors).
> > > Here's a prototype patch:
> > >
> >
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message