mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix (JIRA)" <>
Subject [jira] [Created] (MAHOUT-1009) Remove old LDA implementation from codebase
Date Tue, 08 May 2012 17:41:52 GMT
Jake Mannix created MAHOUT-1009:

             Summary: Remove old LDA implementation from codebase
                 Key: MAHOUT-1009
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.7
            Reporter: Jake Mannix
            Priority: Minor
             Fix For: 0.7

The old LDA is unmaintained and unsupported.  We already (since 0.6) have a newer, faster
version in the o.a.m.clustering.lda.cvb package, which I'm actively working on and using in
production at Twitter.  We should delete the old o.a.m.clustering.lda codebase.

Normally, I'd say that we should at the same time promote o.a.m.clustering.lda.cvb up a package-level,
but that would cause some serious merge conflicts on my GitHub branch (with updates/improvements/new
features targetted for 0.8), so we can get users on this new code by simply changing the driver.classes.props
to have "lda" point to CVB0Driver as the main().

One thing which goes away entirely, is the LDAPrintTopics class, but it's replaced by simply
doing VectorDumper with the -sort option on the model files, which is more standard anyways.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message