mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: CVB outputs very few topics
Date Tue, 23 Apr 2013 16:54:02 GMT
You don't need to set the nt parameter if you pass in the dictionary
directly (as it's inferred by loading the dictionary).

This issue of only seeing one or two topics is very strange however, I've
never seen that myself.

Can you list the contents of the cvb-output directory (and it's
subdirectories)?


On Tue, Apr 23, 2013 at 6:15 AM, Jack Pay <jp242@sussex.ac.uk> wrote:

> The value k will dictate how many topics are output.
>
> There should be no more or less than that.
>
> In your cvb output there should be as many (term | topic) distributions as
> there are topics and in the (document | topic distributions) there should
> as many vectors as there are documents.
> i.e. cvb_out = topic X term matrix
> doc_top_out = doc X topic matrix
>
> The problem may well have occurred at the pre-processing stage, have your
> checked that your input matrix is correct?
>
> You also need to set the number of terms parameter, which I cannot see
> here:
> -nt
>
> Hope this helps
>
> Jack
>
> On 23 Apr 2013, at 12:52, Chris Harrington wrote:
>
> > I need some help understanding CVB.
> >
> > I'm running it over a small data set of text documents (~10000) which
> contain short text snippets of around 100 - 200 characters or 10 - 50 words.
> >
> > When I run CVB I only get 1 or 2 topics in the output.
> >
> > Here's the command I'm using, it only resulted in 1 topic.
> >
> > bin/mahout cvb -i ./contentDataDir/matrix/matrix -o cvb-output -k 10 -x
> 10 -dict ./contentDataDir/sparseVectors/dictionary.file-0 -dt cvb-topic-doc
> -mt cvb-topic-model
> >
> > Is this due to having a poor dataset or is there some parameter I could
> use to get more topics?
>
>


-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message