mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAHOUT-320) Modify IntPairWritable in LDA implementation to be binary comparable to improve performance.
Date Thu, 04 Mar 2010 12:49:27 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated MAHOUT-320:
-----------------------------

    Attachment: IntPairWritable.patch

I see. It's to make positive numbers negative and vice versa, in order to use WritableComparable's
compare() function on bytes, which assumes values are essentially unsigned. Surely this will
end in tears to store values this way. In particular it's already broken Frequency in the
same class, which reads the values as unsigned ints directly.

Here's my complete patch for all said items.

> Modify IntPairWritable in LDA implementation to be binary comparable to improve performance.
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-320
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-320
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Assignee: Robin Anil
>            Priority: Minor
>         Attachments: IntPairWritable.patch, MAHOUT-320.patch, MAHOUT-320.patch, MAHOUT-320.patch,
MAHOUT-320.patch, MAHOUT-320.patch
>
>
> Per discussion with Robin, modifying o.a.m.clustering.lda.IntPairWritable to be binary
comparable will improve the performance of the comparison operations during a sort because
no marshaling will need to occur to compare IntPairWritable instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message