mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <>
Subject [jira] Commented: (MAHOUT-401) Use NamedVector in seq2sparse
Date Sat, 25 Sep 2010 05:36:32 GMT


Hudson commented on MAHOUT-401:

Integrated in Mahout-Quality #326 (See [])
    MAHOUT-401: Use NamedVector in seq2sparse
Adds the -nv option to SparseVectorFromSequenceFiles to create NamedVectors instead of Random
or SequentialAccess vectors
Enhances DictionaryVictorizerTest to assert that the proper vector types are generated
Adds SparseVectorFromSequenceFilesTest to validate the proper command-line option behavior
and vector types.
Extracts random document generation code to RandomDocumentGenerator utility clas

> Use NamedVector in seq2sparse
> -----------------------------
>                 Key: MAHOUT-401
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Drew Farris
>            Assignee: Drew Farris
>             Fix For: 0.4
>         Attachments: MAHOUT-401.patch, MAHOUT-401.patch, pv.patch
> In seq2sparse, TFIDFPartialVectorReducer and TFPartialVectorReducer should write NamedVectors.
It appears that a lack of labels on the vector input to k-means at least breaks the cluster-dumper
in the sense that it no longer prints the original document ids for points.
> See:
> I wonder if this is also an issue with the code that generates vectors from lucene indexes?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message