mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-962) minDF and maxDFPercent filtering doesnt get applied when output weight is tf in SpareVecorsFromSequenceFile
Date Sun, 02 Jun 2013 14:58:20 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robin Anil updated MAHOUT-962:
------------------------------

    Resolution: Fixed
      Assignee: Robin Anil
        Status: Resolved  (was: Patch Available)

Submitted to SVN
                
> minDF and maxDFPercent filtering doesnt get applied when output weight is tf in SpareVecorsFromSequenceFile
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-962
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-962
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6, 0.7, 0.8
>            Reporter: John Conwell
>            Assignee: Robin Anil
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.8
>
>         Attachments: mahout_962.patch
>
>
> This is similar to the same reasoning behind the fix for MAHOUT-957.  The desired output
is term frequency vectors, but I want terms filtered by their min and max DF values. This
might be valid in LDA, where tf vectors is desired for input, but filtering out the maxDFPercent
is also useful.
> Currently minDF and maxDFPercent are only used when calculating tfidf, and the original
tv vectors are not updated to represent the term filtering.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message