mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Scharrer (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1549) Extracting tfidf-vectors by key
Date Sun, 18 May 2014 21:53:04 GMT


Richard Scharrer commented on MAHOUT-1549:

Yes! has the solution.

> Extracting tfidf-vectors by key
> -------------------------------
>                 Key: MAHOUT-1549
>                 URL:
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.7, 0.8, 0.9
>            Reporter: Richard Scharrer
>              Labels: documentation, features, newbie
>             Fix For: 0.7, 0.8, 0.9
> Hi,
> I have about 200000 tfidf-vectors and I need to extract 500 of them of which I have the
keys. Is there some kind of magical option which allows me something like taking the output
of mahout seqdumper and transform it back into a sequencefile that I can use for trainnb /testnb?
The sequencefiles of tfidf use the Text class for the keys and the VectorWritable class for
the values. I tried 
> with different settings but the output always gives me the Text class for both, key and
value which can't be used in trainnb and testnb.
> I posted this question on:
> I ask this question in here because I've seen similar questions on stackoverflow that
where asked last year and still didn't get an answer
> I really need this information so in case you know anything please tell me.
> Regards,
> Richard

This message was sent by Atlassian JIRA

View raw message