mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (Updated) (JIRA)" <>
Subject [jira] [Updated] (MAHOUT-845) Make cluster top terms code more reusable
Date Wed, 11 Jan 2012 17:53:39 GMT


Jeff Eastman updated MAHOUT-845:

    Fix Version/s:     (was: 0.6)

I downloaded the latest patch and it no longer applies without errors. Given the late date
w.r.t. 0.6 code freeze and the lack of an assignee I'm moving the issue to release 0.7
> Make cluster top terms code more reusable
> -----------------------------------------
>                 Key: MAHOUT-845
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Frank Scholten
>            Priority: Minor
>             Fix For: 0.7
>         Attachments: MAHOUT-845.patch, MAHOUT-845.patch, MAHOUT-845.patch
> When working with Mahout text clustering I find that I keep writing code similar to the
contents of
> public static String getTopFeatures(Cluster cluster, String[] dictionary, int numTerms)
> in ClusterDumper in order to determine cluster labels.
> I think it would be useful if (parts of) this code are added to the cluster or vector
API so that you could do something like
> Cluster cluster = ... // get the cluster from seq file iterable
> String clusterLabel = cluster.getTopTerms(1, dictionary); // Do something with the label
> I think this would make it easier to export and post-process clustering results, like
indexing or storing them elsewhere.
> Thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message