mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-521) Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly using a given dictionary
Date Sun, 03 Oct 2010 20:08:34 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917388#action_12917388
] 

Hudson commented on MAHOUT-521:
-------------------------------

Integrated in Mahout-Quality #367 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/367/])
    MAHOUT-521 Moving vectorizer to core


> Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly using
a given dictionary 
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-521
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-521
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>         Attachments: MAHOUT-vectorizer-move.patch, MAHOUT-vectorizer-move.patch
>
>
> Current dictionary vectorizer takes a set of text-files, creates the dictionary and convert
them to text vectors. In a classification scenario, the vectorizer needs to take a Already
existing dictionary and use the ids to convert text to vectors and optionally do the following
> 1. Choose between tf|tfidf weights (need to take the document frequency as an input for
this)
> 2. Add new words to the dictionary and provide options to write it to the disk and read
it back
> 3. Add option to normalize/lognormalize 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message