mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-638) Stochastic svd's is not handling well all cases of sparse vectors
Date Thu, 31 Mar 2011 09:55:05 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013899#comment-13013899
] 

Sean Owen commented on MAHOUT-638:
----------------------------------

Vector has properties isDense() and isSequentialAccess() which should prevent us from ever
needing instanceof.  The patch doesn't quite match code style, but, that's minor enough. It'll
get cleaned up eventually by me, though might have a glance at the rest of the code for spacing
/ brace conventions and such.

Otherwise I'm sure you're welcome to proceed.

> Stochastic svd's is not handling well all cases of sparse vectors 
> ------------------------------------------------------------------
>
>                 Key: MAHOUT-638
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-638
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-638.patch
>
>
> The Mahout patch of the algorithm is not handling all types of sparse input efficiently.
BtJob doesn't handle SequentialSparseVector in a way to pick only non-zero elements from initial
input and QJob doesn't iterate over RandomAccessSparseVector correctly. With extremely sparse
inputs (0.05% non-zero elements) that leads to a terrible inefficiency in the aforementioned
jobs (QJob, BtJob).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message