mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Federico Castanedo <fcast...@inf.uc3m.es>
Subject Re: LDA in Mahout
Date Thu, 03 Feb 2011 16:58:22 GMT
Hi,

Joined a bit late this discussion, but, what about the perplexity measure as
reported on section 7.1. of Blei's LDA paper. it seems to be the metric
which is commonly used to obtain the best value of "k" (topics) when
training a LDA model.

bests,
Federico

2011/1/4 Jake Mannix <jake.mannix@gmail.com>

> Saying we have hashing is different than saying we know what will happen to
> an algorithm once its running over hashed features (as the continuing work
> on our Stochastic SVD demonstrates).
>
> I can certainly try to run LDA over a hashed vector set, but I'm not sure
> what criteria for correctness / quality of the topic model I should use if
> I
> do.
>
>  -jake
>
> On Jan 4, 2011 7:21 AM, "Robin Anil" <robin.anil@gmail.com> wrote:
>
> We already have the second part - the hashing trick. Thanks to Ted, and he
> has a mechanism to partially reverse engineer the feature as well. You
> might
> be able to drop it directly in the job itself or even vectorize and then
> run
> LDA.
>
> Robin
>
> On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> >
> Hey Robin, > > Vowp...
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message