mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: LDA in Mahout
Date Thu, 03 Feb 2011 17:08:20 GMT
I agree here.  Perplexity is probably the best measure of whether LDA is
still capturing the information it needs.

On Thu, Feb 3, 2011 at 8:58 AM, Federico Castanedo <fcastane@inf.uc3m.es>wrote:

> Hi,
>
> Joined a bit late this discussion, but, what about the perplexity measure
> as
> reported on section 7.1. of Blei's LDA paper. it seems to be the metric
> which is commonly used to obtain the best value of "k" (topics) when
> training a LDA model.
>
> bests,
> Federico
>
> 2011/1/4 Jake Mannix <jake.mannix@gmail.com>
>
> > Saying we have hashing is different than saying we know what will happen
> to
> > an algorithm once its running over hashed features (as the continuing
> work
> > on our Stochastic SVD demonstrates).
> >
> > I can certainly try to run LDA over a hashed vector set, but I'm not sure
> > what criteria for correctness / quality of the topic model I should use
> if
> > I
> > do.
> >
> >  -jake
> >
> > On Jan 4, 2011 7:21 AM, "Robin Anil" <robin.anil@gmail.com> wrote:
> >
> > We already have the second part - the hashing trick. Thanks to Ted, and
> he
> > has a mechanism to partially reverse engineer the feature as well. You
> > might
> > be able to drop it directly in the job itself or even vectorize and then
> > run
> > LDA.
> >
> > Robin
> >
> > On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> > >
> > Hey Robin, > > Vowp...
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message